I am writing a multi-way trie that will load in a dictionary that will take words and phrases. So first the dictionary will be loaded into the trie.
This is some (almost) C++ adapted from the following article:
http://www.toptal.com/java/the-trie-a-neglected-data-structure
That one is written in Java, so I've taken the courtesy of giving it to you in c++.
struct Alphabet{
char[] x = 'abcdefghijklmnopqrstuvwxyz';
int findIndex(const char* s){
for(int i = 0; i < 26; ++i){
if(x[i] == *s){
return i;
}
}
return -1;
}
};
struct MWTrieNode{
std::vector::<MWTrieNode*> children;
bool isLast = false;
}
MWTrieNode* getWord(const char* s, int len, MWTrieNode* root){
MWTrieNode* node = root;
Alphabet a;
for(int i = 0; i < len; i++){
const char* currChar = s[i];
int index = a.findIndex(currChar);
MWTrieNode* child = node->children[index];
if(!child){
// No such word
return NULL;
}
// step into the MWTrieNode
node = child;
}
return node;
}
// * corrected comparison between char* and char. (using *(char*))
You can modify the getWord function to take in some parameters to modify how you return your words.
But this should get you started.
For Completions, you'll need to find all of the words below a certain prefix (I'd Imagine). So you'd want' to build several "sub trees" starting with the root of your search prefix (i.e. "House" --> "Housewife", "Housing", "Household", etc.
If you pass in 'Ho', you will find a 'partial word' with no Node saying its at the end. At this point, you can start at the 'o' node.
The part you haven't mentioned is how you store words that are both a word by them selves, as well as a portion of a longer word, for example, Home, Homeowner, Homewrecker.
Those Nodes must have a isLast == true, but also have child nodes. it is this special case that should help you find multiple options for auto-complete. You're running the getWord method several times for a single search, with different prefixes and conditions. The result should be a list of words that all have the prefix you desire.
I'm sure that professors Alvarado and Mirza will be highly interested in the contents of your post.
Related
It is a interview question. Given an array, e.g., [3,2,1,2,7], we want to make all elements in this array unique by incrementing duplicate elements and we require the sum of the refined array is minimal. For example the answer for [3,2,1,2,7] is [3,2,1,4,7] and its sum is 17. Any ideas?
It's not quite as simple as my earlier comment suggested, but it's not terrifically complicated.
First, sort the input array. If it matters to be able to recover the original order of the elements then record the permutation used for the sort.
Second, scan the sorted array from left to right (ie from low to high). If an element is less than or equal to the element to its left, set it to be one greater than that element.
Pseudocode
sar = sort(input_array)
for index = 2:size(sar) ! I count from 1
if sar(index)<=sar(index-1) sar(index) = sar(index-1)+1
forend
Is the sum of the result minimal ? I've convinced myself that it is through some head-scratching and trials but I haven't got a formal proof.
If you only need to find ONE of the best solution, here's the algorythm with some explainations.
The idea of this problem is to find an optimal solution, which can be found only by testing all existing solutions (well, they're infinite, let's stick with the reasonable ones).
I wrote a program in C, because I'm familiar with it, but you can port it to any language you want.
The program does this: it tries to increment one value to the max possible (I'll explain how to find it in the comments under the code sections), than if the solution is not found, decreases this value and goes on with the next one and so on.
It's an exponential algorythm, so it will be very slow on large values of duplicated data (yet, it assures you the best solution is found).
I tested this code with your example, and it worked; not sure if there's any bug left, but the code (in C) is this.
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
typedef int BOOL; //just to ease meanings of values
#define TRUE 1
#define FALSE 0
Just to ease comprehension, I did some typedefs. Don't worry.
typedef struct duplicate { //used to fasten the algorythm; it uses some more memory just to assure it's ok
int value;
BOOL duplicate;
} duplicate_t;
int maxInArrayExcept(int *array, int arraySize, int index); //find the max value in array except the value at the index given
//the result is the max value in the array, not counting th index
int *findDuplicateSum(int *array, int arraySize);
BOOL findDuplicateSum_R(duplicate_t *array, int arraySize, int *tempSolution, int *solution, int *totalSum, int currentSum); //resursive function used to find solution
BOOL check(int *array, int arraySize); //checks if there's any repeated value in the solution
These are all the functions we'll need. All split up for comprehension purpose.
First, we have a struct. This struct is used to avoid checking, for every iteration, if the value on a given index was originally duplicated. We don't want to modify any value not duplicated originally.
Then, we have a couple functions: first, we need to see the worst case scenario: every value after the duplicated ones is already occupied: then we need to increment the duplicated value up to the maximum value reached + 1.
Then, there are the main Function we'll discute later about.
The check Function only checks if there's any duplicated value in a temporary solution.
int main() { //testing purpose
int i;
int testArray[] = { 3,2,1,2,7 }; //test array
int nTestArraySize = 5; //test array size
int *solutionArray; //needed if you want to use the solution later
solutionArray = findDuplicateSum(testArray, nTestArraySize);
for (i = 0; i < nTestArraySize; ++i) {
printf("%d ", solutionArray[i]);
}
return 0;
}
This is the main Function: I used it to test everything.
int * findDuplicateSum(int * array, int arraySize)
{
int *solution = malloc(sizeof(int) * arraySize);
int *tempSolution = malloc(sizeof(int) * arraySize);
duplicate_t *duplicate = calloc(arraySize, sizeof(duplicate_t));
int i, j, currentSum = 0, totalSum = INT_MAX;
for (i = 0; i < arraySize; ++i) {
tempSolution[i] = solution[i] = duplicate[i].value = array[i];
currentSum += array[i];
for (j = 0; j < i; ++j) { //to find ALL the best solutions, we should also put the first found value as true; it's just a line more
//yet, it saves the algorythm half of the duplicated numbers (best/this case scenario)
if (array[j] == duplicate[i].value) {
duplicate[i].duplicate = TRUE;
}
}
}
if (findDuplicateSum_R(duplicate, arraySize, tempSolution, solution, &totalSum, currentSum));
else {
printf("No solution found\n");
}
free(tempSolution);
free(duplicate);
return solution;
}
This Function does a lot of things: first, it sets up the solution array, then it initializes both the solution values and the duplicate array, that is the one used to check for duplicated values at startup. Then, we find the current sum and we set the maximum available sum to the maximum integer possible.
Then, the recursive Function is called; this one gives us the info about having found the solution (that should be Always), then we return the solution as an array.
int findDuplicateSum_R(duplicate_t * array, int arraySize, int * tempSolution, int * solution, int * totalSum, int currentSum)
{
int i;
if (check(tempSolution, arraySize)) {
if (currentSum < *totalSum) { //optimal solution checking
for (i = 0; i < arraySize; ++i) {
solution[i] = tempSolution[i];
}
*totalSum = currentSum;
}
return TRUE; //just to ensure a solution is found
}
for (i = 0; i < arraySize; ++i) {
if (array[i].duplicate == TRUE) {
if (array[i].duplicate <= maxInArrayExcept(solution, arraySize, i)) { //worst case scenario, you need it to stop the recursion on that value
tempSolution[i]++;
return findDuplicateSum_R(array, arraySize, tempSolution, solution, totalSum, currentSum + 1);
tempSolution[i]--; //backtracking
}
}
}
return FALSE; //just in case the solution is not found, but we won't need it
}
This is the recursive Function. It first checks if the solution is ok and if it is the best one found until now. Then, if everything is correct, it updates the actual solution with the temporary values, and updates the optimal condition.
Then, we iterate on every repeated value (the if excludes other indexes) and we progress in the recursion until (if unlucky) we reach the worst case scenario: the check condition not satisfied above the maximum value.
Then we have to backtrack and continue with the iteration, that will go on with other values.
PS: an optimization is possible here, if we move the optimal condition from the check into the for: if the solution is already not optimal, we can't expect to find a better one just adding things.
The hard code has ended, and there are the supporting functions:
int maxInArrayExcept(int *array, int arraySize, int index) {
int i, max = 0;
for (i = 0; i < arraySize; ++i) {
if (i != index) {
if (array[i] > max) {
max = array[i];
}
}
}
return max;
}
BOOL check(int *array, int arraySize) {
int i, j;
for (i = 0; i < arraySize; ++i) {
for (j = 0; j < i; ++j) {
if (array[i] == array[j]) return FALSE;
}
}
return TRUE;
}
I hope this was useful.
Write if anything is unclear.
Well, I got the same question in one of my interviews.
Not sure if you still need it. But here's how I did it. And it worked well.
num_list1 = [2,8,3,6,3,5,3,5,9,4]
def UniqueMinSumArray(num_list):
max=min(num_list)
for i,V in enumerate(num_list):
while (num_list.count(num_list[i])>1):
if (max > num_list[i]+1) :
num_list[i] = max + 1
else:
num_list[i]+=1
max = num_list[i]
i+=1
return num_list
print (sum(UniqueMinSumArray(num_list1)))
You can try with your list of numbers and I am sure it will give you the correct unique minimum sum.
I got the same interview question too. But my answer is in JS in case anyone is interested.
For sure it can be improved to get rid of for loop.
function getMinimumUniqueSum(arr) {
// [1,1,2] => [1,2,3] = 6
// [1,2,2,3,3] = [1,2,3,4,5] = 15
if (arr.length > 1) {
var sortedArr = [...arr].sort((a, b) => a - b);
var current = sortedArr[0];
var res = [current];
for (var i = 1; i + 1 <= arr.length; i++) {
// check current equals to the rest array starting from index 1.
if (sortedArr[i] > current) {
res.push(sortedArr[i]);
current = sortedArr[i];
} else if (sortedArr[i] == current) {
current = sortedArr[i] + 1;
// sortedArr[i]++;
res.push(current);
} else {
current++;
res.push(current);
}
}
return res.reduce((a,b) => a + b, 0);
} else {
return 0;
}
}
This was asked in Amazon telephonic interview - "Can you write a program (in your preferred language C/C++/etc.) to find a given word in a string buffer of big size ? i.e. number of occurrences "
I am still looking for perfect answer which I should have given to the interviewer.. I tried to write a linear search (char by char comparison) and obviously I was rejected.
Given a 40-45 min time for a telephonic interview, what was the perfect algorithm he/she was looking for ???
The KMP Algorithm is a popular string matching algorithm.
KMP Algorithm
Checking char by char is inefficient. If the string has 1000 characters and the keyword has 100 characters, you don't want to perform unnecessary comparisons. The KMP Algorithm handles many cases which can occur, but I imagine the interviewer was looking for the case where: When you begin (pass 1), the first 99 characters match, but the 100th character doesn't match. Now, for pass 2, instead of performing the entire comparison from character 2, you have enough information to deduce where the next possible match can begin.
// C program for implementation of KMP pattern searching
// algorithm
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
void computeLPSArray(char *pat, int M, int *lps);
void KMPSearch(char *pat, char *txt)
{
int M = strlen(pat);
int N = strlen(txt);
// create lps[] that will hold the longest prefix suffix
// values for pattern
int *lps = (int *)malloc(sizeof(int)*M);
int j = 0; // index for pat[]
// Preprocess the pattern (calculate lps[] array)
computeLPSArray(pat, M, lps);
int i = 0; // index for txt[]
while (i < N)
{
if (pat[j] == txt[i])
{
j++;
i++;
}
if (j == M)
{
printf("Found pattern at index %d \n", i-j);
j = lps[j-1];
}
// mismatch after j matches
else if (i < N && pat[j] != txt[i])
{
// Do not match lps[0..lps[j-1]] characters,
// they will match anyway
if (j != 0)
j = lps[j-1];
else
i = i+1;
}
}
free(lps); // to avoid memory leak
}
void computeLPSArray(char *pat, int M, int *lps)
{
int len = 0; // length of the previous longest prefix suffix
int i;
lps[0] = 0; // lps[0] is always 0
i = 1;
// the loop calculates lps[i] for i = 1 to M-1
while (i < M)
{
if (pat[i] == pat[len])
{
len++;
lps[i] = len;
i++;
}
else // (pat[i] != pat[len])
{
if (len != 0)
{
// This is tricky. Consider the example
// AAACAAAA and i = 7.
len = lps[len-1];
// Also, note that we do not increment i here
}
else // if (len == 0)
{
lps[i] = 0;
i++;
}
}
}
}
// Driver program to test above function
int main()
{
char *txt = "ABABDABACDABABCABAB";
char *pat = "ABABCABAB";
KMPSearch(pat, txt);
return 0;
}
This code is taken from a really good site that teaches algorithms:
Geeks for Geeks KMP
Amazon and companies alike expect knowledge of Boyer–Moore string search or / and Knuth–Morris–Pratt algorithms.
Those are good if you want to show perfect knowledge. Otherwise, try to be creative and write something relatively elegant and efficient.
Did you ask about delimiters before you wrote anything? It could be that they may simplify your task to provide some extra information about a string buffer.
Even code below could be ok (it's really not) if you provide enough information in advance, properly explain runtime, space requirements, choice of data containers.
int find( std::string & the_word, std::string & text )
{
std::stringstream ss( text ); // !!! could be really bad idea if 'text' is really big
std::string word;
std::unordered_map< std::string, int > umap;
while( ss >> text ) ++umap[text]; // you have to assume that each word separated by white-spaces.
return umap[the_word];
}
Given a String of words, say "OhMy", keep the uppercase letter fixed(unchanged) but we can change the position of lower case letter. Output all possible permutation.
eg. given "OhMy" it should output [ "OhMy", "OyMh"]
Here is what I did:
public static List<String> Permutation(String s){
List<String> res = new ArrayList<String>();
if (s == null || s.length() == 0){
return res;
}
StringBuilder path = new StringBuilder(s);
List<Character> candidates = new ArrayList<Character>();
List<Integer> position = new ArrayList<Integer>();
for (int i = 0; i < s.length(); i++){
char c = s.charAt(i);
if (Character.isAlphabetic(c) && Character.isLowerCase(c)){
candidates.add(c);
position.add(i);
}
}
boolean[] occurred = new boolean[candidates.size()];
helper(res, path, candidates, position, 0);
return res;
}
public static void helper(List<String> res, StringBuilder path, List<Character> candidates, List<Integer> position, int index){
if (index == position.size()){
res.add(path.toString());
return ;
}
for (int i = index; i < position.size(); i++){
for (int j = 0; j < candidates.size(); j++){
path.setCharAt(position.get(i), candidates.get(j));
char c = candidates.remove(j);
helper(res, path, candidates, position, index+1);
candidates.add(j, c);
}
}
}
for input "Abc"
it will have result [Abc, Acb, Acc, Acb]
Essentially, the outer loop is iterating every possible position, inner loop tries every possible candidates at each possible position.
I don't know why it has duplicates li "Acc, Acb"
It seems like the main point of your implicit question is how to efficiently enumerate all permutations of a given set, which you can read about online (there are several methods). If you can enumerate all permutations of the indices of lower case letters, then it's pretty straightforward to do book keeping and merge each permutation of lower case letters with the original unchanged set of upper case letters, respecting the positions of the upper case letters, so you can output your strings. If you're having difficultly with that part, update your question and someone should be able to help you out.
I want to modify the c++ code below, to use loop instead of recursion.
I know of 2 ways to modify it:
Learn from the code and make a loop algorithm. In this case I think the meaning of code is to printB (except leaf) and printA (expect root) by level order. For a binary (search) tree, how can I traverse it from leaf to root in a loop (without a pointer to parent)?
Use a stack to imitate the process on the stack. In the case, I can't make it, can you help me and say some useful thinking?
void func(const Node& node) {
if (ShouldReturn(node)) {
return;
}
for (int i = 0; i < node.children_size(); ++i) {
const Node& child_node = node.child(i);
func(child_node);
PrintA();
}
PrintB();
}
Assuming you are using C++
For the stack part, lets say, the code does the following.
If Node was leaf, nothing.
Else do the same for each child,then printA after each.
then printB.
So what if I adjusted the code alittle. The adjustments only to fit for iterative way.
void func(const Node& node) {
if(ShouldReturn(node)) return;
PrintB();
for(int i = 0; i < node.children_size(); ++i) {
printA();
const Node& child_node = node.child(i);
func(child_node, false);
}
}
// This way should make it print As & Bs in reverse direction.
// Lets re-adjust the code even further.
void func(const Node& node, bool firstCall = true) {
if(!firstCall) printA; //Placed that here, as printA is always called if a new Node is called, but not for the root Node, that's why I added the firstCall.
if(ShouldReturn(node)) return;
PrintB();
for(int i = 0; i < node.children_size(); ++i) {
const Node& child_node = node.child(i);
func(child_node, false);
}
}
That should reverse the order of printing A & B, I hope I'm not wrong :D
So, now I want to have 2 vectors.
// Lets define an enum
typedef enum{fprintA, fprintB} printType;
void func(const Node& node){
vector<printType> stackOfPrints;
vector<Node*> stackOfNodes; stackOfNodes.push_back(node);
bool first = true; //As we don't need to printA before the root.
while ((int)stackOfNodes.size() > 0){
const Node& fNode = stackOfNodes.back();
stackOfNodes.pop_back();
if (!first) stackOfPrints.push_back(fprintA); // If not root printA.
first = false;
if(ShouldReturn(fNode)) continue;
stackOfPrints.push_back(fprintB);
// here pushing the Nodes in a reverse order so that to be processed in the stack in the correct order.
for(int i = (int)fNode.children_size() - 1; i >= 0; --i){
stackOfNodes.push_back(fNode.child(i));
}
}
// Printing the stackOfPrints in reverse order (remember we changed the code, to initially print As & Bs in reverse direction)
// this way, it will make the function print them in the correct required order
while((int)stackOfPrints.size() > 0){
switch(stackOfPrints.back()){
case fprintA: printA(); break;
case fprintB: printB(); break;
default: break;
};
stackOfPrints.pop_back();
}
}
Let's hope I write the code correctly. :) I hope it helps.
I am looking for the most efficient algorithm to form all possible combinations of words from a string. For example:
Input String: forevercarrot
Output:
forever carrot
forever car rot
for ever carrot
for ever car rot
(All words should be from a dictionary).
I can think of a brute force approach. (find all possible substrings and match) but what would be better ways?
Use a prefix tree for your list of known words. Probably libs like myspell already do so. Try using a ready-made one.
Once you found a match (e.g. 'car'), split your computation: one branch starts to look for a new word ('rot'), another continues to explore variants of current beginning ('carrot').
Effectively you maintain a queue of pairs (start_position, current_position) of offsets into your string every time you split the computation. Several threads can pop from this queue in parallel and try to continue a word that starts from start_position and is already known up to current_position of the pair, but does not end there. When a word is found, it is reported and another pair is popped from the queue. When it's impossible, no result is generated. When a split occurs, a new pair is added to the end of the queue. Initially the queue contains a (0,0).
See this question which has even better answers. It's a standard dynamic programming problem:
How to split a string into words. Ex: "stringintowords" -> "String Into Words"?
A psuedocode implementation, exploiting the fact that every part of the string needs to be a word, we can't skip anything. We work forward from the start of the string until the first bit is a word, and then generate all possible combinations of the rest of the string. Once we've done that, we keep going along until we find any other possibilities for the first word, and so on.
allPossibleWords(string s, int startPosition) {
list ret
for i in startPosition..s'length
if isWord(s[startPosition, i])
ret += s[startPostion, i] * allPossibleWords(s, i)
return ret
}
The bugbear in this code is that you'll end up repeating calculations - in your example, you'll end up having to calculate allPossibleWords("carrot") twice - once in ["forever", allPossibleWords["carrot"]] and once in ["for", "ever", allPossibleWords["carrot"]]. So memoizing this is something to consider.
Input String: forevercarrot
Output:
forever carrot
forever car rot
for ever carrot
for ever car rot
program :
#include<iostream>
#include<string>
#include<vector>
#include<string.h>
void strsplit(std::string str)
{
int len=0,i,x,y,j,k;
len = str.size();
std::string s1,s2,s3,s4,s5,s6,s7;
char *c = new char[len+1]();
char *b = new char[len+1]();
char *d = new char[len+1]();
for(i =0 ;i< len-1;i++)
{
std::cout<<"\n";
for(j=0;j<=i;j++)
{
c[j] = str[j];
b[j] = str[j];
s3 += c[j];
y = j+1;
}
for( int h=i+1;h<len;h++){
s5 += str[h];
}
s6 = s3+" "+s5;
std::cout<<" "<<s6<<"\n";
s5 = "";
for(k = y;k<len-1;k++)
{
d[k] = str[k];
s1 += d[k];
s1 += " ";
for(int l = k+1;l<len;l++){
b[l] = str[l];
s2 += b[l];
}
s4 = s3+" "+s1+s2;
s7 = s4;
std::cout<<" "<<s4<<"\n";
s3 = "";s4 = "";
}
s1 = "";s3 = "";
}
}
int main(int argc, char* argv[])
{
std::string str;
if(argc < 2)
std::cout<<"Usage: "<<argv[0]<<" <InputString> "<<"\n";
else{
str = argv[1];
strsplit(str);
}
return 0;
}