Trimming a string with <= 2 characters - algorithm

Suppose you are given an input string:
"my name is vikas"
Suggest an algorithm to modify it to:
"name vikas"
Which means remove words having length <=2 or say k characters, to make it generic.

I think you can do this in-place in O(n) time. Iterate over the string, keeping a pointer to begining the word you're processing. If you find that the length of the word is greater than k, you overwrite the begining of the string with this word. Here's a C code (it assumes that each word is separated by exacly on space):
void modify(char *s, int k){
int n = strlen(s);
int j = 0, cnt = 0, r = 0, prev = -1;
s[n++] = ' '; // Setinel to avoid special case
for(int i=0; i<n; i++){
if(s[i] == ' '){
if (cnt > k){
if(r > 0) s[r++] = ' ';
while(j < i) s[r++] = s[j++];
}
cnt = 0;
}
else {
if (prev == ' ') j = i;
cnt++;
}
prev = s[i];
}
s[r] = '\0';
}
int main(){
char s[] = "my name is vikas";
modify(s, 2);
printf("%s\n", s);
}

"a short sentence of words" split ' ' filter {_.length > 2} mkString " "
(Scala)

Iterate over individual characters of String keeping the current position in the string and the "current word", accumulate all current words with length >= k, reassemble String from accumulated words?
This algorithm uses in-place rewriting and minimizes the number of copies between elements:
final int k = 2;
char[] test = " my name is el jenso ".toCharArray();
int l = test.length;
int pos = 0;
int cwPos = 0;
int copyPos = 0;
while (pos < l)
{
if (Character.isWhitespace(test[pos]))
{
int r = pos - cwPos;
if (r - 1 < k)
{
copyPos -= r;
cwPos = ++pos;
}
else
{
cwPos = ++pos;
test[copyPos++] = ' ';
}
}
else
{
test[copyPos++] = test[pos++];
}
}
System.out.println(new String(test, 0, copyPos));

split() by " " and omit if length() <= 2

Something like that will suffice (time complexity is optimal, I guess):
input
.Split(' ')
.Where(s => s.Length > k)
.Aggregate(new StringBuilder(), (sb, s) => sb.Append(s))
.ToString()
What about space complexity? Well, this can run in O(k) (we can't count size of input and output, of course), if you think about it. It won't in .NET, because Split makes actual array. But you can build iterators instead. And if you imagine the string is just iterator over characters, it will become O(1) algorithm.

Related

Time limit exceeded in my code given below

Question:
Lapindrome is defined as a string which when split in the middle, gives two halves having the same characters and same frequency of each character. If there are odd number of characters in the string, we ignore the middle character and check for lapindrome. For example gaga is a lapindrome, since the two halves ga and ga have the same characters with same frequency. Also, abccab, rotor and xyzxy are a few examples of lapindromes. Note that abbaab is NOT a lapindrome. The two halves contain the same characters but their frequencies do not match.
Your task is simple. Given a string, you need to tell if it is a lapindrome.
Input:
First line of input contains a single integer T, the number of test cases.
Each test is a single line containing a string S composed of only lowercase English alphabet.
Output:
For each test case, output on a separate line: "YES" if the string is a lapindrome and "NO" if it is not.
Constraints:
1 ≤ T ≤ 100
2 ≤ |S| ≤ 1000, where |S| denotes the length of S
#include <stdio.h>
#include <string.h>
int found;
int lsearch(char a[], int l, int h, char p) {
int i = l;
for (i = l; i <= h; i++) {
if (a[i] == p) {
found = 0;
return i;
}
}
return -1;
}
int main() {
char s[100];
int q, z, i, T;
scanf("%d", &T);
while (T--) {
q = 0;
scanf("%s", &s);
if (strlen(s) % 2 == 0)
for (i = 0; i < (strlen(s) / 2); i++) {
z = lsearch(s, strlen(s) / 2, strlen(s) - 1, s[i]);
if (found == 0) {
found = -1;
s[z] = -2;
} else
q = 1;
} else
for (i = 0; i < (strlen(s) / 2); i++) {
z = lsearch(s, 1 + (strlen(s) / 2), strlen(s) - 1, s[i]);
if (found == 0) {
found = -1;
s[z] = -2;
} else
q = 1;
}
if (strlen(s) % 2 == 0)
for (i = (strlen(s) / 2); i < strlen(s); i++) {
if (s[i] != -2)
q = 1;
} else
for (i = (strlen(s) / 2) + 1; i < strlen(s); i++) {
if (s[i] != -2)
q = 1;
}
if (q == 1)
printf("NO\n");
else
printf("YES\n");
}
}
I am getting correct output in codeblocks but the codechef compiler says time limit exceeded. Please tell me why it says so
For each of O(n) characters you do a O(n) search leading to a O(n^2) algorithm. Throw a thousand character string at it, and it is too slow.
This is solvable in two standard ways. The first is to sort each half of the string and then compare. The second is to create hash tables for letter frequency and then compare.

Run length encoding using O(1) space

Can we do the run-length encoding in place(assuming the input array is very large)
We can do for the cases such as AAAABBBBCCCCDDDD
A4B4C4D4
But how to do it for the case such as ABCDEFG?
where the output would be A1B1C1D1E1F1G1
My first thought was to start encoding from the end, so we will use the free space (if any), after that we can shift the encoded array to the start. A problem with this approach is that it will not work for AAAAB, because there is no free space (it's not needed for A4B1) and we will try to write AAAAB1 on the first iteration.
Below is corrected solution:
(let's assume the sequence is AAABBC)
encode all groups with two or more elements and leave the rest unchanged (this will not increase length of the array) -> A3_B2C
shift everything right eliminating empty spaces after first step -> _A3B2C
encode the array from the start (reusing the already encoded groups of course) -> A3B2C1
Every step is O(n) and as far as I can see only constant additional memory is needed.
Limitations:
Digits are not supported, but that anyway would create problems with decoding as Petar Petrov mentioned.
We need some kind of "empty" character, but this can be worked around by adding zeros: A03 instead of A3_
C++ solution O(n) time O(1) space
string runLengthEncode(string str)
{
int len = str.length();
int j=0,k=0,cnt=0;
for(int i=0;i<len;i++)
{
j=i;
cnt=1;
while(i<len-1 && str[i]==str[i+1])
{
i++;
cnt++;
}
str[k++]=str[j];
string temp =to_string(cnt);
for(auto m:temp)
str[k++] = m;
}
str.resize(k);
return str;
}
null is used to indicate which items are empty and will be ignored for encoding. Also you can't encode digits (AAA2222 => A324 => 324 times 'A', but it's A3;24). Your question opens more questions.
Here's a "solution" in C#
public static void Encode(string[] input)
{
var writeIndex = 0;
var i = 0;
while (i < input.Length)
{
var symbol = input[i];
if (symbol == null)
{
break;
}
var nextIndex = i + 1;
var offset = 0;
var count = CountSymbol(input, symbol, nextIndex) + 1;
if (count == 1)
{
ShiftRight(input, nextIndex);
offset++;
}
input[writeIndex++] = symbol;
input[writeIndex++] = count.ToString();
i += count + offset;
}
Array.Clear(input, writeIndex, input.Length - writeIndex);
}
private static void ShiftRight(string[] input, int nextIndex)
{
var count = CountSymbol(input, null, nextIndex, (a, b) => a != b);
Array.Copy(input, nextIndex, input, nextIndex + 1, count);
}
private static int CountSymbol(string[] input, string symbol, int nextIndex)
{
return CountSymbol(input, symbol, nextIndex, (a, b) => a == b);
}
private static int CountSymbol(string[] input, string symbol, int nextIndex, Func<string, string, bool> cmp)
{
var count = 0;
var i = nextIndex;
while (i < input.Length && cmp(input[i], symbol))
{
count++;
i++;
}
return count;
}
The 1st solution does not take care of single characters. For example - 'Hi!' will not work. I've used totally different approach, used 'insert()' functions to add inplace. This take care of everything, whether the total 'same' character is > 10 or >100 or = 1.
#include<iostream>
#include<algorithm>
using namespace std;
int main(){
string name = "Hello Buddy!!";
int start = 0;
char distinct = name[0];
for(int i=1;i<name.length()+1;){
if(distinct!=name[i]){
string s = to_string(i-start);
name.insert(start+1,s);
name.erase(name.begin() + start + 1 + s.length(),name.begin() + s.length() + i);
i=start+s.length()+1;
start=i;
distinct=name[start];
continue;
}
i++;
}
cout<<name;
}
Suggest me if you find anything incorrect.
O(n), in-place RLE, I couldn't think better than this. It will not place a number, if chars occurence is just 1. Will also place a9a2, if the character comes 11 times.
void RLE(char *str) {
int len = strlen(str);
int count = 1, j = 0;
for (int i = 0; i < len; i++){
if (str[i] == str[i + 1])
count++;
else {
int times = count / 9;
int rem = count % 9;
for (int k = 0; k < times; k++) {
str[j++] = str[i];
_itoa(9, &str[j++], 10);
count = count - 9;
}
if (count > 1) {
str[j++] = str[i];
_itoa(rem, &str[j++], 10);
count = 1;
}
else
str[j++] = str[i];
}
}
cout << str;
}
I/P => aaabcdeeeefghijklaaaaa
O/P => a3bcde4fghijkla5
Inplace solution using c++ ( assumes length of encoding string is not more than actual string length):
#include <bits/stdc++.h>
#include<stdlib.h>
using namespace std;
void replacePattern(char *str)
{
int len = strlen(str);
if (len == 0)
return;
int i = 1, j = 1;
int count;
// for each character
while (str[j])
{
count = 1;
while (str[j] == str[j-1])
{
j = j + 1;
count++;
}
while(count > 0) {
int rem = count%10;
str[i++] = to_string(rem)[0];
count = count/10;
}
// copy character at current position j
// to position i and increment i and j
if (str[j])
str[i++] = str[j++];
}
// add a null character to terminate string
if(str[len-1] != str[len-2]) {
str[i] = '1';
i++;
}
str[i] = '\0';
}
// Driver code
int main()
{
char str[] = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabccccc";
replacePattern(str);
cout << str;
return 0;
}

Split a string to a string of valid words using Dynamic Programming

I need to find a dynamic programming algorithm to solve this problem. I tried but couldn't figure it out. Here is the problem:
You are given a string of n characters s[1...n], which you believe to be a corrupted text document in which all punctuation has vanished (so that it looks something like "itwasthebestoftimes..."). You wish to reconstruct the document using a dictionary, which is available in the form of a Boolean function dict(*) such that, for any string w, dict(w) has value 1 if w is a valid word, and has value 0 otherwise.
Give a dynamic programming algorithm that determines whether the string s[*] can be reconstituted as a sequence of valid words. The running time should be at most O(n^2), assuming that each call to dict takes unit time.
In the event that the string is valid, make your algorithm output the corresponding sequence of words.
Let the length of your compacted document be N.
Let b(n) be a boolean: true if the document can be split into words starting from position n in the document.
b(N) is true (since the empty string can be split into 0 words).
Given b(N), b(N - 1), ... b(N - k), you can construct b(N - k - 1) by considering all words that start at character N - k - 1. If there's any such word, w, with b(N - k - 1 + len(w)) set, then set b(N - k - 1) to true. If there's no such word, then set b(N - k - 1) to false.
Eventually, you compute b(0) which tells you if the entire document can be split into words.
In pseudo-code:
def try_to_split(doc):
N = len(doc)
b = [False] * (N + 1)
b[N] = True
for i in range(N - 1, -1, -1):
for word starting at position i:
if b[i + len(word)]:
b[i] = True
break
return b
There's some tricks you can do to get 'word starting at position i' efficient, but you're asked for an O(N^2) algorithm, so you can just look up every string starting at i in the dictionary.
To generate the words, you can either modify the above algorithm to store the good words, or just generate it like this:
def generate_words(doc, b, idx=0):
length = 1
while true:
assert b(idx)
if idx == len(doc): return
word = doc[idx: idx + length]
if word in dictionary and b(idx + length):
output(word)
idx += length
length = 1
Here b is the boolean array generated from the first part of the algorithm.
To formalize what #MinhPham suggested.
This is a dynammic programming solution.
Given a string str, let
b[i] = true if the substring str[0...i] (inclusive) can be split into valid words.
Prepend some starting character to str, say !, to represent the empty word.
str = "!" + str
The base case is the empty string, so
b[0] = true.
For the iterative case:
b[j] = true if b[i] == true and str[i..j] is a word for all i < j
The O(N^2) Dp is clear but if you know the words of the dictionary, i think you can use some precomputations to get it even faster in O(N).
Aho-Corasick
A dp solution in c++:
int main()
{
set<string> dict;
dict.insert("12");
dict.insert("123");
dict.insert("234");
dict.insert("12345");
dict.insert("456");
dict.insert("1234");
dict.insert("567");
dict.insert("123342");
dict.insert("42");
dict.insert("245436564");
dict.insert("12334");
string str = "123456712334245436564";
int size = str.size();
vector<int> dp(size+1, -1);
dp[0] = 0;
vector<string > res(size+1);
for(int i = 0; i < size; ++i)
{
if(dp[i] != -1)
{
for(int j = i+1; j <= size; ++j)
{
const int len = j-i;
string substr = str.substr(i, len);
if(dict.find(substr) != dict.end())
{
string space = i?" ":"";
res[i+len] = res[i] + space + substr;
dp[i+len] = dp[i]+1;
}
}
}
}
cout << *dp.rbegin() << endl;
cout << *res.rbegin() << endl;
return 0;
}
The string s[] can potentially be split into more than one ways. The method below finds the maximum number of words in which we can split s[]. Below is the sketch/pseudocode of the algorithm
bestScore[i] -> Stores the maximum number of words in which the first i characters can be split (it would be MINUS_INFINITY otherwise)
for (i = 1 to n){
bestScore[i] = MINUS_INFINITY
for (k = 1 to i-1){
bestScore[i] = Max(bestSCore[i], bestScore[i-k]+ f(i,k))
}
}
Where f(i,k) is defined as:
f(i,k) = 1 : if s[i-k+1 to i] is in dictionary
= MINUS_INFINITY : otherwise
bestScore[n] would store the maximum number of words in which s[] can be split (if the value is MINUS_INFINIY, s[] cannot be split)
Clearly the running time is O(n^2)
As this looks like a textbook exercise, I will not write the code to reconstruct the actual split positions.
Below is an O(n^2) solution for this problem.
void findstringvalid() {
string s = "itwasthebestoftimes";
set<string> dict;
dict.insert("it");
dict.insert("was");
dict.insert("the");
dict.insert("best");
dict.insert("of");
dict.insert("times");
vector<bool> b(s.size() + 1, false);
vector<int> spacepos(s.size(), -1);
//Initialization phase
b[0] = true; //String of size 0 is always a valid string
for (int i = 1; i <= s.size(); i++) {
for (int j = 0; j <i; j++) {
//string of size s[ j... i]
if (!b[i]) {
if (b[j]) {
//check if string "j to i" is in dictionary
string temp = s.substr(j, i - j);
set<string>::iterator it = dict.find(temp);
if (it != dict.end()) {
b[i] = true;
spacepos[i-1] = j;
}
}
}
}
}
if(b[s.size()])
for (int i = 1; i < spacepos.size(); i++) {
if (spacepos[i] != -1) {
string temp = s.substr(spacepos[i], i - spacepos[i] + 1);
cout << temp << " ";
}
}
}

Generate all unique substrings for given string

Given a string s, what is the fastest method to generate a set of all its unique substrings?
Example: for str = "aba" we would get substrs={"a", "b", "ab", "ba", "aba"}.
The naive algorithm would be to traverse the entire string generating substrings in length 1..n in each iteration, yielding an O(n^2) upper bound.
Is a better bound possible?
(this is technically homework, so pointers-only are welcome as well)
As other posters have said, there are potentially O(n^2) substrings for a given string, so printing them out cannot be done faster than that. However there exists an efficient representation of the set that can be constructed in linear time: the suffix tree.
There is no way to do this faster than O(n2) because there are a total of O(n2) substrings in a string, so if you have to generate them all, their number will be n(n + 1) / 2 in the worst case, hence the upper lower bound of O(n2) Ω(n2).
First one is brute force which has complexity O(N^3) which could be brought down to O(N^2 log(N))
Second One using HashSet which has Complexity O(N^2)
Third One using LCP by initially finding all the suffix of a given string which has the worst case O(N^2) and best case O(N Log(N)).
First Solution:-
import java.util.Scanner;
public class DistinctSubString {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.print("Enter The string");
String s = in.nextLine();
long startTime = System.currentTimeMillis();
int L = s.length();
int N = L * (L + 1) / 2;
String[] Comb = new String[N];
for (int i = 0, p = 0; i < L; ++i) {
for (int j = 0; j < (L - i); ++j) {
Comb[p++] = s.substring(j, i + j + 1);
}
}
/*
* for(int j=0;j<N;++j) { System.out.println(Comb[j]); }
*/
boolean[] val = new boolean[N];
for (int i = 0; i < N; ++i)
val[i] = true;
int counter = N;
int p = 0, start = 0;
for (int i = 0, j; i < L; ++i) {
p = L - i;
for (j = start; j < (start + p); ++j) {
if (val[j]) {
//System.out.println(Comb[j]);
for (int k = j + 1; k < start + p; ++k) {
if (Comb[j].equals(Comb[k])) {
counter--;
val[k] = false;
}
}
}
}
start = j;
}
System.out.println("Substrings are " + N
+ " of which unique substrings are " + counter);
long endTime = System.currentTimeMillis();
System.out.println("It took " + (endTime - startTime) + " milliseconds");
}
}
Second Solution:-
import java.util.*;
public class DistictSubstrings_usingHashTable {
public static void main(String args[]) {
// create a hash set
Scanner in = new Scanner(System.in);
System.out.print("Enter The string");
String s = in.nextLine();
int L = s.length();
long startTime = System.currentTimeMillis();
Set<String> hs = new HashSet<String>();
// add elements to the hash set
for (int i = 0; i < L; ++i) {
for (int j = 0; j < (L - i); ++j) {
hs.add(s.substring(j, i + j + 1));
}
}
System.out.println(hs.size());
long endTime = System.currentTimeMillis();
System.out.println("It took " + (endTime - startTime) + " milliseconds");
}
}
Third Solution:-
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Arrays;
public class LCPsolnFroDistinctSubString {
public static void main(String[] args) throws IOException {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
System.out.println("Enter Desired String ");
String string = br.readLine();
int length = string.length();
String[] arrayString = new String[length];
for (int i = 0; i < length; ++i) {
arrayString[i] = string.substring(length - 1 - i, length);
}
Arrays.sort(arrayString);
for (int i = 0; i < length; ++i)
System.out.println(arrayString[i]);
long num_substring = arrayString[0].length();
for (int i = 0; i < length - 1; ++i) {
int j = 0;
for (; j < arrayString[i].length(); ++j) {
if (!((arrayString[i].substring(0, j + 1)).equals((arrayString)[i + 1]
.substring(0, j + 1)))) {
break;
}
}
num_substring += arrayString[i + 1].length() - j;
}
System.out.println("unique substrings = " + num_substring);
}
}
Fourth Solution:-
public static void printAllCombinations(String soFar, String rest) {
if(rest.isEmpty()) {
System.out.println(soFar);
} else {
printAllCombinations(soFar + rest.substring(0,1), rest.substring(1));
printAllCombinations(soFar , rest.substring(1));
}
}
Test case:- printAllCombinations("", "abcd");
For big oh ... Best you could do would be O(n^2)
No need to reinvent the wheel, its not based on a strings, but on a sets, so you will have to take the concepts and apply them to your own situation.
Algorithms
Really Good White Paper from MS
In depth PowerPoint
Blog on string perms
well, since there is potentially n*(n+1)/2 different substrings (+1 for the empty substring), I doubt you can be better than O(n*2) (worst case). the easiest thing is to generate them and use some nice O(1) lookup table (such as a hashmap) for excluding duplicates right when you find them.
class SubstringsOfAString {
public static void main(String args[]) {
String string = "Hello", sub = null;
System.out.println("Substrings of \"" + string + "\" are :-");
for (int i = 0; i < string.length(); i++) {
for (int j = 1; j <= string.length() - i; j++) {
sub = string.substring(i, j + i);
System.out.println(sub);
}
}
}
}
class program
{
List<String> lst = new List<String>();
String str = "abc";
public void func()
{
subset(0, "");
lst.Sort();
lst = lst.Distinct().ToList();
foreach (String item in lst)
{
Console.WriteLine(item);
}
}
void subset(int n, String s)
{
for (int i = n; i < str.Length; i++)
{
lst.Add(s + str[i].ToString());
subset(i + 1, s + str[i].ToString());
}
}
}
This prints unique substrings.
https://ideone.com/QVWOh0
def uniq_substring(test):
lista=[]
[lista.append(test[i:i+k+1]) for i in range(len(test)) for k in
range(len(test)-i) if test[i:i+k+1] not in lista and
test[i:i+k+1][::-1] not in lista]
print lista
uniq_substring('rohit')
uniq_substring('abab')
['r', 'ro', 'roh', 'rohi', 'rohit', 'o', 'oh', 'ohi', 'ohit', 'h',
'hi', 'hit', 'i', 'it', 't']
['a', 'ab', 'aba', 'abab', 'b', 'bab']
Many answers that include 2 for loops and a .substring() call claim O(N^2) time complexity. However, it is important to note that the worst case for a .substring() call in Java (post update 6 in Java 7) is O(N). So by adding a .substring() call in your code, the order of N has increased by one.
Therefore, 2 for loops and a .substring() call within those loops equals an O(N^3) time complexity.
It can only be done in o(n^2) time as total number of unique substrings of a string would be n(n+1)/2.
Example:
string s = "abcd"
pass 0: (all the strings are of length 1)
a, b, c, d = 4 strings
pass 1: (all the strings are of length 2)
ab, bc, cd = 3 strings
pass 2: (all the strings are of length 3)
abc, bcd = 2 strings
pass 3: (all the strings are of length 4)
abcd = 1 strings
Using this analogy, we can write solution with o(n^2) time complexity and constant space complexity.
The source code is as below:
#include<stdio.h>
void print(char arr[], int start, int end)
{
int i;
for(i=start;i<=end;i++)
{
printf("%c",arr[i]);
}
printf("\n");
}
void substrings(char arr[], int n)
{
int pass,j,start,end;
int no_of_strings = n-1;
for(pass=0;pass<n;pass++)
{
start = 0;
end = start+pass;
for(j=no_of_strings;j>=0;j--)
{
print(arr,start, end);
start++;
end = start+pass;
}
no_of_strings--;
}
}
int main()
{
char str[] = "abcd";
substrings(str,4);
return 0;
}
Naive algorithm takes O(n^3) time instead of O(n^2) time.
There are O(n^2) number of substrings.
And if you put O(n^2) number of substrings, for example, set,
then set compares O(lgn) comparisons for each string to check if it alrady exists in the set or not.
Besides it takes O(n) time for string comparison.
Therefore, it takes O(n^3 lgn) time if you use set. and you can reduce it O(n^3) time if you use hashtable instead of set.
The point is it is string comparisons not number comparisons.
So one of the best algorithm let's say if you use suffix array and longest common prefix (LCP) algorithm, it reduces O(n^2) time for this problem.
Building a suffix array using O(n) time algorithm.
Time for LCP = O(n) time.
Since for each pair of strings in suffix array, do LCP so total time is O(n^2) time to find the length of distinct subtrings.
Besides if you want to print all distinct substrings, it takes O(n^2) time.
Try this code using a suffix array and longest common prefix. It can also give you the total number of unique substrings. The code might give a stack overflow in visual studio but runs fine in Eclipse C++. That's because it returns vectors for functions. Haven't tested it against extremely long strings. Will do so and report back.
// C++ program for building LCP array for given text
#include <bits/stdc++.h>
#include <vector>
#include <string>
using namespace std;
#define MAX 100000
int cum[MAX];
// Structure to store information of a suffix
struct suffix
{
int index; // To store original index
int rank[2]; // To store ranks and next rank pair
};
// A comparison function used by sort() to compare two suffixes
// Compares two pairs, returns 1 if first pair is smaller
int cmp(struct suffix a, struct suffix b)
{
return (a.rank[0] == b.rank[0])? (a.rank[1] < b.rank[1] ?1: 0):
(a.rank[0] < b.rank[0] ?1: 0);
}
// This is the main function that takes a string 'txt' of size n as an
// argument, builds and return the suffix array for the given string
vector<int> buildSuffixArray(string txt, int n)
{
// A structure to store suffixes and their indexes
struct suffix suffixes[n];
// Store suffixes and their indexes in an array of structures.
// The structure is needed to sort the suffixes alphabatically
// and maintain their old indexes while sorting
for (int i = 0; i < n; i++)
{
suffixes[i].index = i;
suffixes[i].rank[0] = txt[i] - 'a';
suffixes[i].rank[1] = ((i+1) < n)? (txt[i + 1] - 'a'): -1;
}
// Sort the suffixes using the comparison function
// defined above.
sort(suffixes, suffixes+n, cmp);
// At his point, all suffixes are sorted according to first
// 2 characters. Let us sort suffixes according to first 4
// characters, then first 8 and so on
int ind[n]; // This array is needed to get the index in suffixes[]
// from original index. This mapping is needed to get
// next suffix.
for (int k = 4; k < 2*n; k = k*2)
{
// Assigning rank and index values to first suffix
int rank = 0;
int prev_rank = suffixes[0].rank[0];
suffixes[0].rank[0] = rank;
ind[suffixes[0].index] = 0;
// Assigning rank to suffixes
for (int i = 1; i < n; i++)
{
// If first rank and next ranks are same as that of previous
// suffix in array, assign the same new rank to this suffix
if (suffixes[i].rank[0] == prev_rank &&
suffixes[i].rank[1] == suffixes[i-1].rank[1])
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = rank;
}
else // Otherwise increment rank and assign
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = ++rank;
}
ind[suffixes[i].index] = i;
}
// Assign next rank to every suffix
for (int i = 0; i < n; i++)
{
int nextindex = suffixes[i].index + k/2;
suffixes[i].rank[1] = (nextindex < n)?
suffixes[ind[nextindex]].rank[0]: -1;
}
// Sort the suffixes according to first k characters
sort(suffixes, suffixes+n, cmp);
}
// Store indexes of all sorted suffixes in the suffix array
vector<int>suffixArr;
for (int i = 0; i < n; i++)
suffixArr.push_back(suffixes[i].index);
// Return the suffix array
return suffixArr;
}
/* To construct and return LCP */
vector<int> kasai(string txt, vector<int> suffixArr)
{
int n = suffixArr.size();
// To store LCP array
vector<int> lcp(n, 0);
// An auxiliary array to store inverse of suffix array
// elements. For example if suffixArr[0] is 5, the
// invSuff[5] would store 0. This is used to get next
// suffix string from suffix array.
vector<int> invSuff(n, 0);
// Fill values in invSuff[]
for (int i=0; i < n; i++)
invSuff[suffixArr[i]] = i;
// Initialize length of previous LCP
int k = 0;
// Process all suffixes one by one starting from
// first suffix in txt[]
for (int i=0; i<n; i++)
{
/* If the current suffix is at n-1, then we don’t
have next substring to consider. So lcp is not
defined for this substring, we put zero. */
if (invSuff[i] == n-1)
{
k = 0;
continue;
}
/* j contains index of the next substring to
be considered to compare with the present
substring, i.e., next string in suffix array */
int j = suffixArr[invSuff[i]+1];
// Directly start matching from k'th index as
// at-least k-1 characters will match
while (i+k<n && j+k<n && txt[i+k]==txt[j+k])
k++;
lcp[invSuff[i]] = k; // lcp for the present suffix.
// Deleting the starting character from the string.
if (k>0)
k--;
}
// return the constructed lcp array
return lcp;
}
// Utility function to print an array
void printArr(vector<int>arr, int n)
{
for (int i = 0; i < n; i++)
cout << arr[i] << " ";
cout << endl;
}
// Driver program
int main()
{
int t;
cin >> t;
//t = 1;
while (t > 0) {
//string str = "banana";
string str;
cin >> str; // >> k;
vector<int>suffixArr = buildSuffixArray(str, str.length());
int n = suffixArr.size();
cout << "Suffix Array : \n";
printArr(suffixArr, n);
vector<int>lcp = kasai(str, suffixArr);
cout << "\nLCP Array : \n";
printArr(lcp, n);
// cum will hold number of substrings if that'a what you want (total = cum[n-1]
cum[0] = n - suffixArr[0];
// vector <pair<int,int>> substrs[n];
int count = 1;
for (int i = 1; i <= n-suffixArr[0]; i++) {
//substrs[0].push_back({suffixArr[0],i});
string sub_str = str.substr(suffixArr[0],i);
cout << count << " " << sub_str << endl;
count++;
}
for(int i = 1;i < n;i++) {
cum[i] = cum[i-1] + (n - suffixArr[i] - lcp[i - 1]);
int end = n - suffixArr[i];
int begin = lcp[i-1] + 1;
int begin_suffix = suffixArr[i];
for (int j = begin, k = 1; j <= end; j++, k++) {
//substrs[i].push_back({begin_suffix, lcp[i-1] + k});
// cout << "i push " << i << " " << begin_suffix << " " << k << endl;
string sub_str = str.substr(begin_suffix, lcp[i-1] +k);
cout << count << " " << sub_str << endl;
count++;
}
}
/*int count = 1;
cout << endl;
for(int i = 0; i < n; i++){
for (auto it = substrs[i].begin(); it != substrs[i].end(); ++it ) {
string sub_str = str.substr(it->first, it->second);
cout << count << " " << sub_str << endl;
count++;
}
}*/
t--;
}
return 0;
}
And here's a simpler algorithm:
#include <iostream>
#include <string.h>
#include <vector>
#include <string>
#include <algorithm>
#include <time.h>
using namespace std;
char txt[100000], *p[100000];
int m, n;
int cmp(const void *p, const void *q) {
int rc = memcmp(*(char **)p, *(char **)q, m);
return rc;
}
int main() {
std::cin >> txt;
int start_s = clock();
n = strlen(txt);
int k; int i;
int count = 1;
for (m = 1; m <= n; m++) {
for (k = 0; k+m <= n; k++)
p[k] = txt+k;
qsort(p, k, sizeof(p[0]), &cmp);
for (i = 0; i < k; i++) {
if (i != 0 && cmp(&p[i-1], &p[i]) == 0){
continue;
}
char cur_txt[100000];
memcpy(cur_txt, p[i],m);
cur_txt[m] = '\0';
std::cout << count << " " << cur_txt << std::endl;
count++;
}
}
cout << --count << endl;
int stop_s = clock();
float run_time = (stop_s - start_s) / double(CLOCKS_PER_SEC);
cout << endl << "distinct substrings \t\tExecution time = " << run_time << " seconds" << endl;
return 0;
}
Both algorithms listed a simply too slow for extremely long strings though. I tested the algorithms against a string of length over 47,000 and the algorithms took over 20 minutes to complete, with the first one taking 1200 seconds, and the second one taking 1360 seconds, and that's just counting the unique substrings without outputting to the terminal. So for probably strings of length up to 1000 you might get a working solution. Both solutions did compute the same total number of unique substrings though. I did test both algorithms against string lengths of 2000 and 10,000. The times were for the first algorithm: 0.33 s and 12 s; for the second algorithm it was 0.535 s and 20 s. So it looks like in general the first algorithm is faster.
Here is my code in Python. It generates all possible substrings of any given string.
def find_substring(str_in):
substrs = []
if len(str_in) <= 1:
return [str_in]
s1 = find_substring(str_in[:1])
s2 = find_substring(str_in[1:])
substrs.append(s1)
substrs.append(s2)
for s11 in s1:
substrs.append(s11)
for s21 in s2:
substrs.append("%s%s" %(s11, s21))
for s21 in s2:
substrs.append(s21)
return set(substrs)
If you pass str_ = "abcdef" to the function, it generates the following results:
a, ab, abc, abcd, abcde, abcdef, abcdf, abce, abcef, abcf, abd, abde, abdef, abdf, abe, abef, abf, ac, acd, acde, acdef, acdf, ace, acef, acf, ad, ade, adef, adf, ae, aef, af, b, bc, bcd, bcde, bcdef, bcdf, bce, bcef, bcf, bd, bde, bdef, bdf, be, bef, bf, c, cd, cde, cdef, cdf, ce, cef, cf, d, de, def, df, e, ef, f

Efficiently reverse the order of the words (not characters) in an array of characters

Given an array of characters which forms a sentence of words, give an efficient algorithm to reverse the order of the words (not characters) in it.
Example input and output:
>>> reverse_words("this is a string")
'string a is this'
It should be O(N) time and O(1) space (split() and pushing on / popping off the stack are not allowed).
The puzzle is taken from here.
A solution in C/C++:
void swap(char* str, int i, int j){
char t = str[i];
str[i] = str[j];
str[j] = t;
}
void reverse_string(char* str, int length){
for(int i=0; i<length/2; i++){
swap(str, i, length-i-1);
}
}
void reverse_words(char* str){
int l = strlen(str);
//Reverse string
reverse_string(str,strlen(str));
int p=0;
//Find word boundaries and reverse word by word
for(int i=0; i<l; i++){
if(str[i] == ' '){
reverse_string(&str[p], i-p);
p=i+1;
}
}
//Finally reverse the last word.
reverse_string(&str[p], l-p);
}
This should be O(n) in time and O(1) in space.
Edit: Cleaned it up a bit.
The first pass over the string is obviously O(n/2) = O(n). The second pass is O(n + combined length of all words / 2) = O(n + n/2) = O(n), which makes this an O(n) algorithm.
pushing a string onto a stack and then popping it off - is that still O(1)?
essentially, that is the same as using split()...
Doesn't O(1) mean in-place? This task gets easy if we can just append strings and stuff, but that uses space...
EDIT: Thomas Watnedal is right. The following algorithm is O(n) in time and O(1) in space:
reverse string in-place (first iteration over string)
reverse each (reversed) word in-place (another two iterations over string)
find first word boundary
reverse inside this word boundary
repeat for next word until finished
I guess we would need to prove that step 2 is really only O(2n)...
#include <string>
#include <boost/next_prior.hpp>
void reverse(std::string& foo) {
using namespace std;
std::reverse(foo.begin(), foo.end());
string::iterator begin = foo.begin();
while (1) {
string::iterator space = find(begin, foo.end(), ' ');
std::reverse(begin, space);
begin = boost::next(space);
if (space == foo.end())
break;
}
}
Here is my answer. No library calls and no temp data structures.
#include <stdio.h>
void reverse(char* string, int length){
int i;
for (i = 0; i < length/2; i++){
string[length - 1 - i] ^= string[i] ;
string[i] ^= string[length - 1 - i];
string[length - 1 - i] ^= string[i];
}
}
int main () {
char string[] = "This is a test string";
char *ptr;
int i = 0;
int word = 0;
ptr = (char *)&string;
printf("%s\n", string);
int length=0;
while (*ptr++){
++length;
}
reverse(string, length);
printf("%s\n", string);
for (i=0;i<length;i++){
if(string[i] == ' '){
reverse(&string[word], i-word);
word = i+1;
}
}
reverse(&string[word], i-word); //for last word
printf("\n%s\n", string);
return 0;
}
In pseudo code:
reverse input string
reverse each word (you will need to find word boundaries)
#Daren Thomas
Implementation of your algorithm (O(N) in time, O(1) in space) in D (Digital Mars):
#!/usr/bin/dmd -run
/**
* to compile & run:
* $ dmd -run reverse_words.d
* to optimize:
* $ dmd -O -inline -release reverse_words.d
*/
import std.algorithm: reverse;
import std.stdio: writeln;
import std.string: find;
void reverse_words(char[] str) {
// reverse whole string
reverse(str);
// reverse each word
for (auto i = 0; (i = find(str, " ")) != -1; str = str[i + 1..length])
reverse(str[0..i]);
// reverse last word
reverse(str);
}
void main() {
char[] str = cast(char[])("this is a string");
writeln(str);
reverse_words(str);
writeln(str);
}
Output:
this is a string
string a is this
in Ruby
"this is a string".split.reverse.join(" ")
In C: (C99)
#include <stdio.h>
#include <string.h>
void reverseString(char* string, int length)
{
char swap;
for (int i = 0; i < length/2; i++)
{
swap = string[length - 1 - i];
string[length - 1 - i] = string[i];
string[i] = swap;
}
}
int main (int argc, const char * argv[]) {
char teststring[] = "Given an array of characters which form a sentence of words, give an efficient algorithm to reverse the order of the words (not characters) in it.";
printf("%s\n", teststring);
int length = strlen(teststring);
reverseString(teststring, length);
int i = 0;
while (i < length)
{
int wordlength = strspn(teststring + i, "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
reverseString(teststring + i, wordlength);
i += wordlength + 1;
}
printf("%s\n", teststring);
return 0;
}
This gives output:
Given an array of characters which
form a sentence of words, give an
efficient algorithm to reverse the
order of the words (not characters) in
it.
.it in )characters not( words the
of order the reverse to algorithm
efficient an give ,words of sentence a
form which characters of array an
Given
This takes at most 4N time, with small constant space.
Unfortunately, It doesn't handle punctuation or case gracefully.
O(N) in space and O(N) in time solution in Python:
def reverse_words_nosplit(str_):
"""
>>> f = reverse_words_nosplit
>>> f("this is a string")
'string a is this'
"""
iend = len(str_)
s = ""
while True:
ispace = str_.rfind(" ", 0, iend)
if ispace == -1:
s += str_[:iend]
break
s += str_[ispace+1:iend]
s += " "
iend = ispace
return s
You would use what is known as an iterative recursive function, which is O(N) in time as it takes N (N being the number of words) iterations to complete and O(1) in space as each iteration holds its own state within the function arguments.
(define (reverse sentence-to-reverse)
(reverse-iter (sentence-to-reverse ""))
(define (reverse-iter(sentence, reverse-sentence)
(if (= 0 string-length sentence)
reverse-sentence
( reverse-iter( remove-first-word(sentence), add-first-word(sentence, reverse-sentence)))
Note: I have written this in scheme which I am a complete novice, so apologies for lack of correct string manipulation.
remove-first-word finds the first word boundary of sentence, then takes that section of characters (including space and punctuation) and removes it and returns new sentence
add-first-word finds the first word boundary of sentence, then takes that section of characters (including space and punctuation) and adds it to reverse-sentence and returns new reverse-sentence contents.
THIS PROGRAM IS TO REVERSE THE SENTENCE USING POINTERS IN "C language" By Vasantha kumar & Sundaramoorthy from KONGU ENGG COLLEGE, Erode.
NOTE: Sentence must end with dot(.)
because NULL character is not assigned automatically
at the end of the sentence*
#include<stdio.h>
#include<string.h>
int main()
{
char *p,*s="this is good.",*t;
int i,j,a,l,count=0;
l=strlen(s);
p=&s[l-1];
t=&s[-1];
while(*t)
{
if(*t==' ')
count++;
t++;
}
a=count;
while(l!=0)
{
for(i=0;*p!=' '&&t!=p;p--,i++);
p++;
for(;((*p)!='.')&&(*p!=' ');p++)
printf("%c",*p);
printf(" ");
if(a==count)
{
p=p-i-1;
l=l-i;
}
else
{
p=p-i-2;
l=l-i-1;
}
count--;
}
return 0;
}
Push each word onto a stack. Pop all the words off the stack.
using System;
namespace q47407
{
class MainClass
{
public static void Main(string[] args)
{
string s = Console.ReadLine();
string[] r = s.Split(' ');
for(int i = r.Length-1 ; i >= 0; i--)
Console.Write(r[i] + " ");
Console.WriteLine();
}
}
}
edit: i guess i should read the whole question... carry on.
Efficient in terms of my time: took under 2 minutes to write in REBOL:
reverse_words: func [s [string!]] [form reverse parse s none]
Try it out:
reverse_words "this is a string"
"string a is this"
A C++ solution:
#include <string>
#include <iostream>
using namespace std;
string revwords(string in) {
string rev;
int wordlen = 0;
for (int i = in.length(); i >= 0; --i) {
if (i == 0 || iswspace(in[i-1])) {
if (wordlen) {
for (int j = i; wordlen--; )
rev.push_back(in[j++]);
wordlen = 0;
}
if (i > 0)
rev.push_back(in[i-1]);
}
else
++wordlen;
}
return rev;
}
int main() {
cout << revwords("this is a sentence") << "." << endl;
cout << revwords(" a sentence with extra spaces ") << "." << endl;
return 0;
}
A Ruby solution.
# Reverse all words in string
def reverse_words(string)
return string if string == ''
reverse(string, 0, string.size - 1)
bounds = next_word_bounds(string, 0)
while bounds.all? { |b| b < string.size }
reverse(string, bounds[:from], bounds[:to])
bounds = next_word_bounds(string, bounds[:to] + 1)
end
string
end
# Reverse a single word between indices "from" and "to" in "string"
def reverse(s, from, to)
half = (from - to) / 2 + 1
half.times do |i|
s[from], s[to] = s[to], s[from]
from, to = from.next, to.next
end
s
end
# Find the boundaries of the next word starting at index "from"
def next_word_bounds(s, from)
from = s.index(/\S/, from) || s.size
to = s.index(/\s/, from + 1) || s.size
return { from: from, to: to - 1 }
end
in C#, in-place, O(n), and tested:
static char[] ReverseAllWords(char[] in_text)
{
int lindex = 0;
int rindex = in_text.Length - 1;
if (rindex > 1)
{
//reverse complete phrase
in_text = ReverseString(in_text, 0, rindex);
//reverse each word in resultant reversed phrase
for (rindex = 0; rindex <= in_text.Length; rindex++)
{
if (rindex == in_text.Length || in_text[rindex] == ' ')
{
in_text = ReverseString(in_text, lindex, rindex - 1);
lindex = rindex + 1;
}
}
}
return in_text;
}
static char[] ReverseString(char[] intext, int lindex, int rindex)
{
char tempc;
while (lindex < rindex)
{
tempc = intext[lindex];
intext[lindex++] = intext[rindex];
intext[rindex--] = tempc;
}
return intext;
}
This problem can be solved with O(n) in time and O(1) in space. The sample code looks as mentioned below:
public static string reverseWords(String s)
{
char[] stringChar = s.ToCharArray();
int length = stringChar.Length, tempIndex = 0;
Swap(stringChar, 0, length - 1);
for (int i = 0; i < length; i++)
{
if (i == length-1)
{
Swap(stringChar, tempIndex, i);
tempIndex = i + 1;
}
else if (stringChar[i] == ' ')
{
Swap(stringChar, tempIndex, i-1);
tempIndex = i + 1;
}
}
return new String(stringChar);
}
private static void Swap(char[] p, int startIndex, int endIndex)
{
while (startIndex < endIndex)
{
p[startIndex] ^= p[endIndex];
p[endIndex] ^= p[startIndex];
p[startIndex] ^= p[endIndex];
startIndex++;
endIndex--;
}
}
Algorithm:
1).Reverse each word of the string.
2).Reverse resultant String.
public class Solution {
public String reverseWords(String p) {
String reg=" ";
if(p==null||p.length()==0||p.equals(""))
{
return "";
}
String[] a=p.split("\\s+");
StringBuilder res=new StringBuilder();;
for(int i=0;i<a.length;i++)
{
String temp=doReverseString(a[i]);
res.append(temp);
res.append(" ");
}
String resultant=doReverseString(res.toString());
System.out.println(res);
return resultant.toString().replaceAll("^\\s+|\\s+$", "");
}
public String doReverseString(String s)`{`
char str[]=s.toCharArray();
int start=0,end=s.length()-1;
while(start<end)
{
char temp=str[start];
str[start]=str[end];
str[end]=temp;
start++;
end--;
}
String a=new String(str);
return a;
}
public static void main(String[] args)
{
Solution r=new Solution();
String main=r.reverseWords("kya hua");
//System.out.println(re);
System.out.println(main);
}
}
A one liner:
l="Is this as expected ??"
" ".join(each[::-1] for each in l[::-1].split())
Output:
'?? expected as this Is'
The algorithm to solve this problem is based on two steps process, first step will reverse the individual words of string,then in second step, reverse whole string. Implementation of algorithm will take O(n) time and O(1) space complexity.
#include <stdio.h>
#include <string.h>
void reverseStr(char* s, int start, int end);
int main()
{
char s[] = "This is test string";
int start = 0;
int end = 0;
int i = 0;
while (1) {
if (s[i] == ' ' || s[i] == '\0')
{
reverseStr(s, start, end-1);
start = i + 1;
end = start;
}
else{
end++;
}
if(s[i] == '\0'){
break;
}
i++;
}
reverseStr(s, 0, strlen(s)-1);
printf("\n\noutput= %s\n\n", s);
return 0;
}
void reverseStr(char* s, int start, int end)
{
char temp;
int j = end;
int i = start;
for (i = start; i < j ; i++, j--) {
temp = s[i];
s[i] = s[j];
s[j] = temp;
}
}

Resources