Deleting 'b' and duplicating 'a' in-place - algorithm

This is a homework task: given a string of only 'a', 'b' and 'c's, duplicate all 'a', delete all 'b's and leave the c's alone (i.e. just copy them in the new string).
I.e.
Input: "abbc"
Output: "aac"
This should be done in linear time. Extra points for doing it in-place in the same string.
I thought of the following:
allocate some space at the end of the string and create the final string there while keeping the index in the original string
(code in C++)
int main() {
string str;
str.resize(50);
char chars[] = "abbc";
size_t slen = sizeof(chars) - 1;
copy(begin(chars), end(chars), str.begin());
int j = 49;
for (int i = slen - 1; i >= 0; --i) {
if (str[i] == 'a') {
str[j] = 'a';
str[--j] = 'a';
--j;
}
else if (str[i] == 'c') {
str[j] = 'c';
--j;
}
// Nothing for b
}
++j;
str.erase(str.begin(), str.begin() + j);
}
this seems to work but I'd like to know if there's a better way to do it (indices are easy to mess up) and if I'm not overlooking something.

Since you are creating a new string, there is no need to do it backwards. Processing from the front will be clearer. Despite being homework you should strive for general code and avoid magic numbers in your code (like 49).
Ask yourself what's the longest possible output string. How can you compute it? Can you write code which can deal with any input string, or handle failures in a graceful manner (not crash)?

Doing it in-place is easy, if you don't need to expand the size of the string. You just need to realize that you can use two passes. 2*n is just as linear as n.
In the first pass, you delete all the 'b' characters by copying from an input index to an output index, only incrementing the output index if the character is not 'b'. At the same time count the number of 'a' characters.
At this point you have enough information to tell if the final string will be the same size or less than the original string, and you can do the appropriate error handling if not.
The second pass works from the end of the string to the beginning. Start with the input index being the end of the current string, and the output index being that plus the number of 'a' characters that you counted earlier. Each time you copy an 'a' you insert an extra and decrement the output index twice.

Related

Rank of string solution

I was going through a question where it asks you to find the rank of the string amongst its permutations sorted lexicographically.
O(N^2) is pretty clear.
Some websites have O(n) solution also. The part that is optimized is basically pre-populating a count array such that
count[i] contains count of characters which are present in str and are smaller than i.
I understand that this'd reduce the complexity but can't fit my head around how we are calculating this array. This is the function that does this (taken from the link):
// Construct a count array where value at every index
// contains count of smaller characters in whole string
void populateAndIncreaseCount (int* count, char* str)
{
int i;
for( i = 0; str[i]; ++i )
++count[ str[i] ];
for( i = 1; i < 256; ++i )
count[i] += count[i-1];
}
Can someone please provide an intuitive explanation of this function?
That solution is doing a Bucket Sort and then sorting the output.
A bucket sort is O(items + number_of_possible_distinct_inputs) which for a fixed alphabet can be advertised as O(n).
However in practice UTF makes for a pretty large alphabet. I would therefore suggest a quicksort instead. Because a quicksort that divides into the three buckets of <, > and = is efficient for a large character set, but still takes advantage of a small one.
Understood after going through it again. Got confused due to wrong syntax in c++. It's actually doing a pretty simple thing (Here's the java version :
void populateAndIncreaseCount(int[] count, String str) {
// count is initialized to zero for all indices
for (int i = 0; i < str.length(); ++i) {
count[str.charAt(i)]++;
}
for (int i = 1; i < 256; ++i)
count[i] += count[i - 1];
}
After first step, indices whose character are present in string are non-zero. Then, for each index in count array, it'd be the sum of all the counts till index-1 since array represents lexicographically sorted characters. And, after each search, we udate the count array also:
// Removes a character ch from count[] array
// constructed by populateAndIncreaseCount()
void updatecount (int* count, char ch)
{
int i;
for( i = ch; i < MAX_CHAR; ++i )
--count[i];
}

Find the students in batch who have all the subjects passed comparing the keywords

The question goes like this. There are 5 students grouped as batch. Each students in a batch should have passed in atleast one of the 5 subjects. The 5 subjects are physics(p), chemistry(c), botony(b), maths(m),and zoology(z). we need to find the batch of students who have passed in these.
So there could be a batch like
batch-1
pcmbz
batch2
pccmb (not one student passed in zoology)
batch3
zmbcc (not one student passed in physics)
etc...
so if the user gives input like pcmbzpczbmpccmb. There are 3 batches out of which 2 batches have cleared atleast one subject of the 5 subjects.
My code :
static int team(string skills)
{
char[] subjects = { 'p', 'c', 'm', 'b', 'z' };
int count = 0;
int p = 0, c = 0, m = 0, b = 0, z = 0;
int divisor = 0;
char[] result = skills.ToCharArray();
StringBuilder sb = new StringBuilder();
//parse the string to 5 chars each representing the 5 students subject.
for (int l = 0; l < skills.Length; l++)
{
if (l % 5 == 0 && l > 0)
{
sb.Append(" ");
}
sb.Append(skills[l]);
}
string format = sb.ToString();
char space = ' ';
string[] resultarray = format.Split(space);
for (int i = 0; i < resultarray.Length; i++)
{
if (resultarray[i].Contains("pcmbz"))
{
count = count + 1;
}
}
return count;
}
However when I use contains, it matches for the exact word and does not recognize the jumbled one. Here pcmbz and pczbm are same.
Should i again for the logic of anagrams in the code and then check if it same then add it to the count or is there any better way to do this?
First of all I have to say that I'm by far not an experienced C# programmer. That being said I'm even certain that my solution isn't the one with the best performance but it gets the job done if I understand your problem correctly.
Checking for anagrams like you've suggested would also be possible but in case you'd want to change your subjects in future you'd have to change every single anagram string. What we want to do is to check whether every subject character exists in the given string.
Here's how I'd do the last for-loop in your code (everything else remains the way you've done it):
for (int i = 0; i < resultarray.Length; i++)
{
bool containsAllSubjects = true;
foreach(char sub in subjects)
{
if (!resultarray[i].Contains(sub)) containsAllSubjects = false;
}
if (containsAllSubjects) count++;
}
Now let me explain what this code is doing peace by peace:
foreach(char sub in subjects)
With this foreach-Loop we get every character you've put into the subjects array. This is just a convenience to ensure the code even works if you'd change your subject characters.
if(!resultarray[i].Contains(sub)) containsAllSubjects = false;
Because we're iterating with every single sub character from your subjects array over this expression we check whether the current string from the resultarray contains every single one of the subject characters. If one or more subject characters are missing in the current string, we set a boolean variable to false.
if(containsAllSubjects) count++;
Since the boolean variable containsAllSubjects is only true when every single subject was inside the string we've checked we can increase the count by one.
Another thing I'd recommend is to change the separation of your skills string. The way you're doing it right now is to separate after 5 characters which is the correct way to do for you when your subjects array contains 5 elements. However, if you ever wanted to change the number of subjects you'd have to think of changing the hard-coded magic number 5 in your skills string separation, too. This is why I'd recommend separating according to the number of elements the subjects array contains:
if (l % subjects.Length == 0 && l > 0)
{
sb.Append(" ");
}
That way your code becomes flexible in regard of the number of subjects.
Try it online!
I hope my answer could help you at least somewhat with your question.

Input multiple strings with spaces in c++ in 2d char array

For a given integer n at runtime, I have to input n strings which can have spaces in between them.
The test case format for input is:
3
xyz b
abcd
defg
So I am taking input like this because cin skips spaces.
int n, column = 1000;// maximum size of strings=1000
cin >> n;
char **String = 0;
String = new char *[n];
int i;
for (i=0; i < n; i++){
String[i] = new char [column];
}
for (i = 0; i < n; i++)
cin.getline(String[i],1000)
}
After the 2nd string i.e. "abcd" its taking a newline as the 3rd string. Why is that?
If this is wrong, how do I take input in this case?
Your code is correct. The problem lies just in the way the input is given at terminal.
Suppose I execute the program, and I put n = 2, i.e. I wish to input two strings. If after typing 2, I press enter, the first string that goes into Strings is an empty one. But, if I type the string, that I intend to input first, just after the 2 (no space after 2), then my problem is solved.
What if I don't want to change the way I wish to input (i.e. I wish to press enter after entering the number of strings that I want to input, and then take the upcoming strings in), then what I can do is, write cin.getline(String[0], 1000) before the following loop in the above code.
for (i = 0; i < n; i++)
cin.getline(String[i],1000)
For once we take the empty space after 2 (2, being the input n, referring to the details above in this answer) as the first input string in String, but the loop that follows starts taking input afresh, and the input string that follows on terminal at the next line (the first one we actually intend to input), is saved in String[0].
So, the problem is solved then.

Utilizing a trie Data Struture

So i am implementing a trie used for reading unique words from a file. I was looking online on how to implement it and came across this way of doing it:
//to insert the string in the trie tree
'
void insert(struct node *head, string str)
{
int i, j;
for(i = 0;i < str.size(); ++i){
//if the child node is pointing to NULL
if(head -> next_char[str[i] - 'a'] == NULL){
struct node *n;
//initialise the new node
n = new struct node;
for(j = 0;j < 26; ++j){
n -> next_char[j] = NULL;
}
n -> end_string = 0;
head -> next_char[str[i] - 'a'] = n;
head = n;
}
//if the child node is not pointing to q
else head = head -> next_char[str[i] - 'a'];
}
//to mark the end_string flag for this string
head -> end_string = 1;
}
My confusion arrise from the line:
'head -> next_char[str[i] - 'a'] == NULL
what is the purpose of using the subtraction of 'a' in all the ways this code implements it?
Trie makes sense when your input strings consist of the characters from the some relatively small fixed alphabet.
In this concrete implementation it is assumed that these characters are in the range from a..z, 26 total.
As in many languages Char type is actually Int or Byte, you can perform arithmetic operations with it. When you do that, character's code is used as operand.
Having above in mind it is clear, that the easiest way to map chars from some known non-zero based range to zero-based range is to subtract the start element of the range from code of the particular character.
For 'a'..'z' range:
when you do ('a' - 'a') you get 0
'b' - 'a' = 1
...
'z' - 'a' = 25
I add a small information beside the answer of #Aivean which is perfect.
In this implementation, each node in the Trie contains a static array of size 26 to points to its children.
The goal of this is to find the correct child in constant time, and hence check if it exists or not.
To find the correct child (the position in the array of 26) we use current_Char - 'a' as it is well explained in #Aivean Answer.

Longest common prefix for n string

Given n string of max length m. How can we find the longest common prefix shared by at least two strings among them?
Example: ['flower', 'flow', 'hello', 'fleet']
Answer: fl
I was thinking of building a Trie for all the string and then checking the deepest node (satisfies longest) that branches out to two/more substrings (satisfies commonality). This takes O(n*m) time and space. Is there a better way to do this
Why to use trie(which takes O(mn) time and O(mn) space, just use the basic brute force way. first loop, find the shortest string as minStr, which takes o(n) time, second loop, compare one by one with this minStr, and keep an variable which indicates the rightmost index of minStr, this loop takes O(mn) where m is the shortest length of all strings. The code is like below,
public String longestCommonPrefix(String[] strs) {
if(strs.length==0) return "";
String minStr=strs[0];
for(int i=1;i<strs.length;i++){
if(strs[i].length()<minStr.length())
minStr=strs[i];
}
int end=minStr.length();
for(int i=0;i<strs.length;i++){
int j;
for( j=0;j<end;j++){
if(minStr.charAt(j)!=strs[i].charAt(j))
break;
}
if(j<end)
end=j;
}
return minStr.substring(0,end);
}
there is an O(|S|*n) solution to this problem, using a trie. [n is the number of strings, S is the longest string]
(1) put all strings in a trie
(2) do a DFS in the trie, until you find the first vertex with more than 1 "edge".
(3) the path from the root to the node you found at (2) is the longest common prefix.
There is no possible faster solution then it [in terms of big O notation], at the worst case, all your strings are identical - and you need to read all of them to know it.
I would sort them, which you can do in n lg n time. Then any strings with common prefixes will be right next to eachother. In fact you should be able to keep a pointer of which index you're currently looking at and work your way down for a pretty speedy computation.
As a completely different answer from my other answer...
You can, with one pass, bucket every string based on its first letter.
With another pass you can sort each bucket based on its second later. (This is known as radix sort, which is O(n*m), and O(n) with each pass.) This gives you a baseline prefix of 2.
You can safely remove from your dataset any elements that do not have a prefix of 2.
You can continue the radix sort, removing elements without a shared prefix of p, as p approaches m.
This will give you the same O(n*m) time that the trie approach does, but will always be faster than the trie since the trie must look at every character in every string (as it enters the structure), while this approach is only guaranteed to look at 2 characters per string, at which point it culls much of the dataset.
The worst case is still that every string is identical, which is why it shares the same big O notation, but will be faster in all cases as is guaranteed to use less comparisons since on any "non-worst-case" there are characters that never need to be visited.
public String longestCommonPrefix(String[] strs) {
if (strs == null || strs.length == 0)
return "";
char[] c_list = strs[0].toCharArray();
int len = c_list.length;
int j = 0;
for (int i = 1; i < strs.length; i++) {
for (j = 0; j < len && j < strs[i].length(); j++)
if (c_list[j] != strs[i].charAt(j))
break;
len = j;
}
return new String(c_list).substring(0, len);
}
It happens that the bucket sort (radix sort) described by corsiKa can be extended such that all strings are eventually placed alone in a bucket, and at that point, the LCP for such a lonely string is known. Further, the shustring of each string is also known; it is one longer than is the LCP. The bucket sort is defacto the construction of a suffix array but, only partially so. Those comparisons that are not performed (as described by corsiKa) indeed represent those portions of the suffix strings that are not added to the suffix array. Finally, this method allows for determination of not just the LCP and shustrings, but also one may easily find those subsequences that are not present within the string.
Since the world is obviously begging for an answer in Swift, here's mine ;)
func longestCommonPrefix(strings:[String]) -> String {
var commonPrefix = ""
var indices = strings.map { $0.startIndex}
outerLoop:
while true {
var toMatch: Character = "_"
for (whichString, f) in strings.enumerate() {
let cursor = indices[whichString]
if cursor == f.endIndex { break outerLoop }
indices[whichString] = cursor.successor()
if whichString == 0 { toMatch = f[cursor] }
if toMatch != f[cursor] { break outerLoop }
}
commonPrefix.append(toMatch)
}
return commonPrefix
}
Swift 3 Update:
func longestCommonPrefix(strings:[String]) -> String {
var commonPrefix = ""
var indices = strings.map { $0.startIndex}
outerLoop:
while true {
var toMatch: Character = "_"
for (whichString, f) in strings.enumerated() {
let cursor = indices[whichString]
if cursor == f.endIndex { break outerLoop }
indices[whichString] = f.characters.index(after: cursor)
if whichString == 0 { toMatch = f[cursor] }
if toMatch != f[cursor] { break outerLoop }
}
commonPrefix.append(toMatch)
}
return commonPrefix
}
What's interesting to note:
this runs in O^2, or O(n x m) where n is the number of strings and m
is the length of the shortest one.
this uses the String.Index data type and thus deals with Grapheme Clusters which the Character type represents.
And given the function I needed to write in the first place:
/// Takes an array of Strings representing file system objects absolute
/// paths and turn it into a new array with the minimum number of common
/// ancestors, possibly pushing the root of the tree as many level downwards
/// as necessary
///
/// In other words, we compute the longest common prefix and remove it
func reify(fullPaths:[String]) -> [String] {
let lcp = longestCommonPrefix(fullPaths)
return fullPaths.map {
return $0[lcp.endIndex ..< $0.endIndex]
}
}
here is a minimal unit test:
func testReifySimple() {
let samplePaths:[String] = [
"/root/some/file"
, "/root/some/other/file"
, "/root/another/file"
, "/root/direct.file"
]
let expectedPaths:[String] = [
"some/file"
, "some/other/file"
, "another/file"
, "direct.file"
]
let reified = PathUtilities().reify(samplePaths)
for (index, expected) in expectedPaths.enumerate(){
XCTAssert(expected == reified[index], "failed match, \(expected) != \(reified[index])")
}
}
Perhaps a more intuitive solution. Channel the already found prefix out of earlier iteration as input string to the remaining or next string input. [[[w1, w2], w3], w4]... so on], where [] is supposedly the LCP of two strings.
public String findPrefixBetweenTwo(String A, String B){
String ans = "";
for (int i = 0, j = 0; i < A.length() && j < B.length(); i++, j++){
if (A.charAt(i) != B.charAt(j)){
return i > 0 ? A.substring(0, i) : "";
}
}
// Either of the string is prefix of another one OR they are same.
return (A.length() > B.length()) ? B.substring(0, B.length()) : A.substring(0, A.length());
}
public String longestCommonPrefix(ArrayList<String> A) {
if (A.size() == 1) return A.get(0);
String prefix = A.get(0);
for (int i = 1; i < A.size(); i++){
prefix = findPrefixBetweenTwo(prefix, A.get(i)); // chain the earlier prefix
}
return prefix;
}

Resources