Algorithm to convert string from one format to other - algorithm

I was looking at a problem which stated to convert strings as below.
s = "3[a]2[bc]", return "aaabcbc".
s = "3[a2[c]]", return "accaccacc".
s = "2[abc]3[cd]ef", return "abcabccdcdcdef".
I was able to understand how to do that.
I was thinking is there a way to do this in reverse. when given a string like abcabccdcdcdef I understand there can be many possibilities of representation. I was looking can we do it in representation which takes lowest memory(Not algorithmic but of the final string).

for max efficiency, we'd want to have as much reduction as possible. I think I would do something like this (it may not be the most efficient algorithm):
s = "whateverwhateveryouwantwantwantababababababababc"
possibilities = []
repeats = []
def findRepeats(repeats, s, length):
for i in range(0, len(s) - 2 * length + 1):
if s[i:i+length] == s[i+length:i+2*length]:
trackInd = i+length
times = 2
while trackInd+2*length <= len(s):
if (s[trackInd:trackInd+length]==s[trackInd+length:trackInd+2*length]):
times += 1
else: break
trackInd += length
repeats.append((i, times, s[i:i+length]))
return repeats
for i in range(0, len(s)):
repeats = findRepeats(repeats, s, i)
def formPossibility(repeats, s):
build = ""
i = 0
while i < len(s):
pass = True
for repeat in repeats:
if repeat[0] == i:
pass = False
build += repeat[1] + "["
build += repeat[2] + "]"
break
if pass:
build += s[i]
# I didn't finish this but you would loop through all the repeats and test
# them to see if they overlap, and then you would take all the posibilities
# of different ways to make them so that some are there, and some are not.
# in any case, I think that you get the idea.
# I couldn't finish this because I am doing the coding on stackoverflow and
# its like so painful and so hard to debug. also I don't have enough time sorry

Don't know if it is the most efficient or if it is efficient at all, but here is my approach using js.
function format(pattern, length, times) {
var result = "";
if (times == 0) {
result = pattern;
} else {
result = (times + 1).toString() + "[" + pattern + "]";
}
return result;
}
function encode(input) {
var result = "";
var pattern = { length: 1, times: 0 };
var i = 1;
while (i <= input.length / 2) {
var subpattern = input.substr(0, i);
var j = 0;
while (input.substr(i + j * i, i) == subpattern && j + i < input.length) {
j++;
}
if (i * j > pattern.length * pattern.times) {
pattern.length = i;
pattern.times = j;
}
i++;
}
if (pattern.length > 1) {
result = format(encode(input.substr(0, pattern.length)), pattern.length, pattern.times);
} else {
result = format(input.substr(0, pattern.length), pattern.length, pattern.times);
}
if (pattern.length + pattern.length * pattern.times < input.length) {
result += encode(input.substr(pattern.length + pattern.length * pattern.times, input.length));
}
return result;
}

Related

Codility PermMissingElem

My solution scored only 40% correctness on Codility.
What am I doing wrong?
Here is the test result (https://codility.com/demo/results/trainingU7KSSG-YNX/)
Problem:
A zero-indexed array A consisting of N different integers is given. The array contains integers in the range [1..(N + 1)], which means that exactly one element is missing.
Your goal is to find that missing element.
Solution:
function solution(A) {
var output = 1;
var arrayLength = A.length;
if(!arrayLength){
return output;
}
if(arrayLength == 1) {
return A[0] + 1;
}
var sorted = A.sort(sortingFn);
for(var i = 0; i < A.length - 1; i++) {
if(A[i+1] - A[i] > 1) {
output = A[i] + 1;
break;
}
}
return output;
}
function sortingFn(a, b) {
return a - b;
}
Result
Your algorithm find the missing element by comparing neighboring elements in the array. This means it is incapable of handling cases where the first or last element is missing, as these only have a single neighbor.
Consider as an example [1, 2, 3]. The missing element would be 4. But since 4 has precisely one neighbor (3), it can't be found by the algorithm, and 1 will be returned.
In addition your algorithm is rather inefficient, as sorting takes O(n lg n), while the problem is solveable in O(n):
find_missing(arr):
s = sum(arr)
s' = (len(arr) + 1) * (len(arr) + 2) / 2
return s' - s
This code works by summing up all elements in the array and comparing it to expected sum, if all elements were present. The advantage of this approach is that it only requires linear operations and will find the missing element with relative simplicity.
Try this in c#:
using System;
using System.Linq;
private static int PermMissingElem(int[] A)
{
if (!A.Any() || !A.Any(x => x == 1)) { return 1; }
var size = A.Length;
var numberTwoList = Enumerable.Range(1, size);
var failNumber = numberTwoList.Except(A);
if (!failNumber.Any()) { return A.Max() + 1; }
return failNumber.FirstOrDefault();
}
Well, when the last element is missing, you obviously return 1, since your if statement's condition is always false. Same for first element.
Take as example this input:
1 2 3 4 5
the difference will be always 1, but element 6 is missing.
The reason for this incapability of your algorithm to catch these cases, is that it examines neighboring elements (A[i + 1] and A[i]).
JS solution #1
function solution(A) {
if (A.length === 1) {
return A[0] > 1 ? 1 : 2;
}
const map = {};
let max = 0;
for (let i = 0, len = A.length; i < len; i++) {
map[A[i]] = A[i];
if (A[i] > max) {
max = A[i]
}
}
for (let i = 0, len = A.length; i < len; i++) {
if (!map[i + 1]) {
return i + 1;
}
}
return max + 1
}
JS solution #2
function solution(A) {
const s = A.reduce((a, b) => {return a + b}, 0);
const s2 = (A.length + 1) * (A.length + 2) / 2
return s2 - s;
}
try this arrays (like the end test):
[1,2,3] -> must return 4;
[1] -> must return 2;
[2] -> must return 1;
[2,3] -> must return 1;
[1, 3] -> 2
But for #2 solution [4] returns -1 and for [123] returns -120. The test will show 100 points. But actually, it doesn't work as expected on my opinion.
Both solutions work with the same performance.
Try this javascript function:
function solution(A) {
let givenSum = 0;
let expectedSum = 0;
let size = A.length;
for(let i = 1; i <= size +1; i++){
expectedSum = expectedSum + i;
}
for(let i = 0; i < size; i++){
givenSum += A[i];
}
return expectedSum - givenSum;
}
here is my solution:
https://app.codility.com/demo/results/trainingMZWVVT-55Y/
function solution(A) {
A = A.sort((a,b)=>a-b)
if(A[0]!==1) return 1
for(let i = 0; i < A.length; i++)
{
if(A[i+1]-A[i]!==1) return A[i] + 1
}
return A[A.length] + 1
}
Tested on Codility with 100% score see here
The solution i implemented is using set difference. since the question guaranties exactly one element is missing.
def solution(A):
# write your code in Python 3.6
N = len(A)
difference = set(range(1, N+2)) - set(A)
return difference.pop()

Confusion related to the time complexity of this algorithm

I was going through some of the articles of the leetcode. Here is one of them https://leetcode.com/articles/optimal-division/.
Given a list of positive integers, the adjacent integers will perform the float division. For example, [2,3,4] -> 2 / 3 / 4.
However, you can add any number of parenthesis at any position to change the priority of operations. You should find out how to add parenthesis to get the maximum result, and return the corresponding expression in string format. Your expression should NOT contain redundant parenthesis.
Example:
Input: [1000,100,10,2]
Output: "1000/(100/10/2)"
Explanation:
1000/(100/10/2) = 1000/((100/10)/2) = 200
However, the bold parenthesis in "1000/((100/10)/2)" are redundant,
since they don't influence the operation priority. So you should return "1000/(100/10/2)".
Other cases:
1000/(100/10)/2 = 50
1000/(100/(10/2)) = 50
1000/100/10/2 = 0.5
1000/100/(10/2) = 2
I think the time complexity of the solution is O(N^2) isn't it?
Here is the memoization solution
public class Solution {
class T {
float max_val, min_val;
String min_str, max_str;
}
public String optimalDivision(int[] nums) {
T[][] memo = new T[nums.length][nums.length];
T t = optimal(nums, 0, nums.length - 1, "", memo);
return t.max_str;
}
public T optimal(int[] nums, int start, int end, String res, T[][] memo) {
if (memo[start][end] != null)
return memo[start][end];
T t = new T();
if (start == end) {
t.max_val = nums[start];
t.min_val = nums[start];
t.min_str = "" + nums[start];
t.max_str = "" + nums[start];
memo[start][end] = t;
return t;
}
t.min_val = Float.MAX_VALUE;
t.max_val = Float.MIN_VALUE;
t.min_str = t.max_str = "";
for (int i = start; i < end; i++) {
T left = optimal(nums, start, i, "", memo);
T right = optimal(nums, i + 1, end, "", memo);
if (t.min_val > left.min_val / right.max_val) {
t.min_val = left.min_val / right.max_val;
t.min_str = left.min_str + "/" + (i + 1 != end ? "(" : "") + right.max_str + (i + 1 != end ? ")" : "");
}
if (t.max_val < left.max_val / right.min_val) {
t.max_val = left.max_val / right.min_val;
t.max_str = left.max_str + "/" + (i + 1 != end ? "(" : "") + right.min_str + (i + 1 != end ? ")" : "");
}
}
memo[start][end] = t;
return t;
}
}

Algorithm - find all permutations of string a in string b

Say we have
string a = "abc"
string b = "abcdcabaabccbaa"
Find location of all permutations of a in b. I am trying to find an effective algorithm for this.
Pseudo code:
sort string a // O(a loga)
for windows of length a in b // O(b)?
sort that window of b // O(~a loga)?
compare to a
if equal
save the index
So would this be a correct algorithm? Run time would be around O(aloga + ba loga) ~= O(a loga b)? How efficient would this be? Possibly way to reduce to O(a*b) or better?
sorting is very expensive, and doesn't use the fact you move along b with a sliding window.
I would use a comparison method that is location agnostic (since any permutation is valid) - assign each letter a prime number, and each string will be the multiplication of its letter values.
this way, as you go over b, each step requires just dividing by the letter you remove from he left, and multiplying with the next letter.
You also need to convince yourself that this indeed matches uniquely for each string and covers all permutations - this comes from the uniqueness of prime decomposition. Also note that on larger strings the numbers get big so you may need some library for large numbers
There is no need to hash, you can just count frequencies on your sliding window, and check if it matches. Assuming the size of your alphabet is s, you get a very simple O(s(n + m)) algorithm.
// a = [1 .. m] and b = [1 .. n] are the input
cnta = [1 .. s] array initialized to 0
cntb = [1 .. s] array initialized to 0
// nb_matches = the number of i s.t. cnta[i] = cntb[i]
// thus the current subword = a iff. nb_matches = s
nb_matches = s
for i = 1 to m:
if cntb[a[i]] = 0: nb_matches -= 1
cntb[a[i]] += 1
ans = 0
for i = 1 to n:
if cntb[b[i]] = cnta[b[i]]: nb_matches -= 1
cntb[b[i]] += 1
if nb_matches = s: ans += 1
if cntb[b[i]] = cnta[b[i]]: nb_matches += 1
if i - m + 1 >= 1:
if cntb[b[i - m + 1]] = cnta[b[i - m + 1]]: nb_matches -= 1
cntb[b[i - m + 1]] += 1
if cntb[b[i - m + 1]] = cnta[b[i - m + 1]]: nb_matches += 1
cntb[b[i - m + 1]] -= 1
return ans
Write a function strcount() to count the number of occurrences of character ch in a string or sub-sring str.
Then just pass through the search string.
for(i=0;i<haystacklenN-NeedleN+1;i++)
{
for(j=0;j<needleN;j++)
if(strcount(haystack + i, Nneedle, needle[j]) != strcount(needles, needlesN, needle[j])
break
}
if(j == needleN)
/* found a permuatation */
Below is my solution. The space complexity is just O(a + b), and the running time (if I can calculate correctly..) is O(b*a), as for each character in b, we may do a recursion a levels deep.
md5's answer is a good one and will be faster!!
public class FindPermutations {
public static void main(String[] args) {
System.out.println(numPerms(new String("xacxzaa"),
new String("fxaazxacaaxzoecazxaxaz")));
System.out.println(numPerms(new String("ABCD"),
new String("BACDGABCDA")));
System.out.println(numPerms(new String("AABA"),
new String("AAABABAA")));
// prints 4, then 3, then 3
}
public static int numPerms(final String a, final String b) {
int sum = 0;
for (int i = 0; i < b.length(); i++) {
if (permPresent(a, b.substring(i))) {
sum++;
}
}
return sum;
}
// is a permutation of a present at the start of b?
public static boolean permPresent(final String a, final String b) {
if (a.isEmpty()) {
return true;
}
if (b.isEmpty()) {
return false;
}
final char first = b.charAt(0);
if (a.contains(b.substring(0, 1))) {
// super ugly, but removes first from a
return permPresent(a.substring(0, a.indexOf(first)) + a.substring(a.indexOf(first)+1, a.length()),
b.substring(1));
}
return false;
}
}
For searchability's sake, I arrive on this page afer looking for other solutions to compare mine to, with the problem originating from watching this clip: https://www.hackerrank.com/domains/tutorials/cracking-the-coding-interview. The original problem statement was something like 'find all permutations of s in b'.
Use 2 hash tables and with a sliding window of size = length of smaller string:
int premutations_of_B_in_A(string large, string small) {
unordered_map<char, int> characters_in_large;
unordered_map<char, int> characters_in_small;
int ans = 0;
for (char c : small) {
characters_in_small[c]++;
}
for (int i = 0; i < small.length(); i++) {
characters_in_large[large[i]]++;
ans += (characters_in_small == characters_in_large);
}
for (int i = small.length(); i < large.length(); i++) {
characters_in_large[large[i]]++;
if (characters_in_large[large[i - small.length()]]-- == 1)
characters_in_large.erase(large[i - small.length()]);
ans += (characters_in_small == characters_in_large);
}
return ans;
}
This is almost solution but will help you to count occurrences of permutations of small strings into larger string
made for only lower case chars
This solution having --
Time Complexity - O(L)
where L is length of large input provided to problem, the exact would be to include 26 too for every char present in Large array but by ignoring constant terms, I will solely stand for this.
Space Complexity - O(1)
because 26 is also constant and independent of how large input would be.
int findAllPermutations(string small, string larger) {
int freqSmall[26] = {0};
//window size
int n = small.length();
//to return
int finalAns = 0;
for (char a : small) {
freqSmall[a - 97]++;
}
int freqlarger[26]={0};
int count = 0;
int j = 0;
for (int i = 0; larger[i] != '\0'; i++) {
freqlarger[larger[i] - 97]++;
count++;
if (count == n) {
count = 0;
int i;
for (i = 0; i < 26; i++) {
if (freqlarger[i] != freqSmall[i]) {
break;
}
}
if (i == 26) {
finalAns++;
}
freqlarger[larger[j] - 97]--;
j++;
}
}
return finalAns;
}
int main() {
string s, t;
cin >> s >> t;
cout << findAllPermutations(s, t) << endl;
return 0;
}

Solve number of substrings having two unique characters in O(n)

I'm working on a series of substring problem:
Given a string:
Find the substring containing only two unique characters that has maximum length.
Find the number of all substrings containing AT MOST two unique characters.
Find the number of all substrings containing two unique characters.
Seems like problem 1 and 2 has O(n) solution. However I cannot think of a O(n) solution for problem 3.(Here is the solution for problem 2 and here is for problem 1.).
So I would like to know does a O(n) solution for problem 3 exist or not?
Adding sample input/output for problem 3:
Given: abbac
Return: 6
Because there are 6 substring containing two unique chars:
ab,abb,abba,bba,ba,ac
Find the number of all substrings containing two unique characters.
Edit : I misread the question. This solution finds unique substrings with at least 2 unique characters
The number of substrings for a given word whose length is len is given by len * (len + 1) / 2
sum = len * (len + 1) / 2
We are looking for substrings whose length is greater than 1. The above formula includes substrings which are of length 1. We need to substract those substrings.
So the total number of 2 letter substrings now is len * (len + 1) / 2 - l.
sum = `len * (len + 1) / 2 - l`
Find the longest consecutive run of characters which are alike. Apply step 1 and 2.
Subtract this current sum from the sum as obtained from step 2.
Sample implementation follows.
public static int allUniq2Substrings(char s[]) {
int sum = s.length * (s.length + 1) / 2 - s.length;
int sameRun = 0;
for (int i = 0, prev = -1; i < s.length; prev = s[i++]) {
if (s[i] != prev) {
sum -= sameRun * (sameRun + 1) / 2 - sameRun;
sameRun = 1;
} else {
sameRun++;
}
}
return sum - (sameRun * (sameRun + 1) / 2 - sameRun);
}
allUniq2Substrings("aaac".toCharArray());
3
allUniq2Substrings("aabc".toCharArray());
5
allUniq2Substrings("aaa".toCharArray());
0
allUniq2Substrings("abcd".toCharArray());
6
Edit
Let me try this again. I use the above 3 invariants.
This is a subproblem of finding all substrings which contain at least 2 unique characters.
I have a method posted above which gives me unique substrings for any length. I will use it to generate substrings from a set which contains at 2 unique characters.
We only need to keep track of the longest consequent run of characters whose set length is 2. ie Any permutation of 2 unique characters. The sum of such runs gives us the total number of desired substrings.
public static int allUniq2Substrings(char s[]) {
int sum = s.length * (s.length + 1) / 2 - s.length;
int sameRun = 0;
for (int i = 0, prev = -1; i < s.length; prev = s[i++]) {
if (s[i] != prev) {
sum -= sameRun * (sameRun + 1) / 2 - sameRun;
sameRun = 1;
} else {
sameRun++;
}
}
return sum - (sameRun * (sameRun + 1) / 2 - sameRun);
}
public static int uniq2substring(char s[]) {
int last = 0, secondLast = 0;
int sum = 0;
for (int i = 1; i < s.length; i++) {
if (s[i] != s[i - 1]) {
last = i;
break;
}
}
boolean OneTwo = false;
int oneTwoIdx = -1; //alternating pattern
for (int i = last + 1; i < s.length; ++i) {
if (s[secondLast] != s[i] && s[last] != s[i]) { //detected more than 2 uniq chars
sum += allUniq2Substrings(Arrays.copyOfRange(s, secondLast, i));
secondLast = last;
last = i;
if (OneTwo) {
secondLast = oneTwoIdx;
}
OneTwo = false;
} else if (s[i] != last) { //alternating pattern detected a*b*a
OneTwo = true;
oneTwoIdx = i;
}
}
return sum + allUniq2Substrings(Arrays.copyOfRange(s, secondLast, s.length));
}
uniq2substring("abaac".toCharArray())
6
uniq2substring("aab".toCharArray())
2
uniq2substring("aabb".toCharArray())
4
uniq2substring("ab".toCharArray())
1
I think the link posted by you for the solution of the problem 2
http://coders-stop.blogspot.in/2012/09/directi-online-test-number-of.html
can we very easily be modelled for the solution of the third problem as well.
Just modify the driver program as under
int numberOfSubstrings ( string A ) {
int len = A.length();
int res = 0, j = 1, c = 1, a[2][2];
a[0][0] = A[0]; a[0][1] = 1;
for(int i=0;i<len;i++) {
>>int start = -1;
for (;j<len; j++) {
c = isInArray(a, c, A[j]);
>> if (c == 2 && start != - 1) start = j;
if(c == -1) break;
}
>>c = removeFromArray(a,A[i]);
res = (res + j - start);
}
return res;
}
The complete explanation on the derivation can be found in the link itself :)

Minimize the sequence by putting appropriate operations ' DP'

Given a sequence,say,
222
We have to put a '+' or '* ' between each adjacent pair.
'* ' has higher precedence over '+'
We have to o/p the string whose evaluation leads to minimum value.
O/p must be lexicographically smallest if there are more than one.
inp:222
o/p: 2*2+2
Explaination:
2+2+2=6
2+2*2=6
2*2+2=6
of this 3rd is lexicographically smallest.
I was wondering how to construct a DP solution for this.
Let DP[N] be the smallest value we can obtain using the first N elements. I will do a recursive implementation(using memoization) with pseudocode:
int solve(int index)
{
if (index == N)
return 0;
if (DP[index] already computed)
return DP[index];
int result = INFINITELY LARGE NUMBER;
//put a + sign
result = min(result, input[index] + solve(index + 1));
//put consecutive * signs
int cur = input[index];
for (int i = index + 1; i < N; i++)
{
cur *= input[i];
result = min(result, cur + solve(i + 1));
}
return DP[index] = result;
}
Call it with solve(0);
You can easily reconstruct the solution after this. I haven't tested it and maybe I have missed an edge case in the pseudocode but it should give you the right track.
string reconstruct(int index)
{
if (index == N)
return "";
string result = "";
//put consecutive * signs
int cur = input[index];
string temp = ToString(input[index]);
for (int i = index + 1; i < N; i++)
{
cur *= input[i];
temp += "*";
if (DP[index] == cur + DP[i + 1])
result = temp + reconstruct(i + 1);
}
//put a + sign
if (result == "")
result = ToString(input[index]) + "+" + reconstruct(index + 1);
return result;
}
string result = reconstruct(0);
P.S Sorry for the many edits.

Resources