Need Help with Studying Binary Search Tree Pseudo-code - binary-tree

I’m studying for a midterm. Can anyone help me get started with this question from my textbook?
Write a function to print out all items of a binary search tree with value v such that min_val ≤ v ≤ max_val.
You can start from the following prototype:
template
void
BSTree::PrintRange( const Comparable & min_val,
const Comparable & max_val ) const;
Analyze the running time of your function in terms of the number of nodes n and the
number of elements k in the range using O (Big-Oh) notation.
Thank you very much.

You can do it with a simple double recursion.
BSTree::PrintRange( const Comparable & min_val, const Comparable & max_val ) const
{
if(min_val<=v && v<=max_value)
{
print(v);
if(left!=NULL)
left.PrintRange(min_val,max_val);
if(right!=NULL)
right.PrintRange(min_val,max_val);
}
else if(v<min_val) //go to the right to find a bigger value
{
if(right!=NULL)
right.PrintRange(min_val,max_val);
}
else //v>max_val
{ //go to the left to find a smaller value
if(left != NULL)
left.PrintRange(min_val,max_val);
}
}

Related

Last remaining number

I was asked this question in an interview.
Given an array 'arr' of positive integers and a starting index 'k' of the array. Delete element at k and jump arr[k] steps in the array in circular fashion. Do this repeatedly until only one element remain. Find the last remaining element.
I thought of O(nlogn) solution using ordered map. Is any O(n) solution possible?
My guess is that there is not an O(n) solution to this problem based on the fact that it seems to involve doing something that is impossible. The obvious thing you would need to solve this problem in linear time is a data structure like an array that exposes two operations on an ordered collection of values:
O(1) order-preserving deletes from the data structure.
O(1) lookups of the nth undeleted item in the data structure.
However, such a data structure has been formally proven to not exist; see "Optimal Algorithms for List Indexing and Subset Rank" and its citations. It is not a proof to say that if the natural way to solve some problem involves using a data structure that is impossible, the problem itself is probably impossible, but such an intuition is often correct.
Anyway there are lots of ways to do this in O(n log n). Below is an implementation of maintaining a tree of undeleted ranges in the array. GetIndex() below returns an index into the original array given a zero-based index into the array if items had been deleted from it. Such a tree is not self-balancing so will have O(n) operations in the worst case but in the average case Delete and GetIndex will be O(log n).
namespace CircleGame
{
class Program
{
class ArrayDeletes
{
private class UndeletedRange
{
private int _size;
private int _index;
private UndeletedRange _left;
private UndeletedRange _right;
public UndeletedRange(int i, int sz)
{
_index = i;
_size = sz;
}
public bool IsLeaf()
{
return _left == null && _right == null;
}
public int Size()
{
return _size;
}
public void Delete(int i)
{
if (i >= _size)
throw new IndexOutOfRangeException();
if (! IsLeaf())
{
int left_range = _left._size;
if (i < left_range)
_left.Delete(i);
else
_right.Delete(i - left_range);
_size--;
return;
}
if (i == _size - 1)
{
_size--; // Can delete the last item in a range by decremnting its size
return;
}
if (i == 0) // Can delete the first item in a range by incrementing the index
{
_index++;
_size--;
return;
}
_left = new UndeletedRange(_index, i);
int right_index = i + 1;
_right = new UndeletedRange(_index + right_index, _size - right_index);
_size--;
_index = -1; // the index field of a non-leaf is no longer necessarily valid.
}
public int GetIndex(int i)
{
if (i >= _size)
throw new IndexOutOfRangeException();
if (IsLeaf())
return _index + i;
int left_range = _left._size;
if (i < left_range)
return _left.GetIndex(i);
else
return _right.GetIndex(i - left_range);
}
}
private UndeletedRange _root;
public ArrayDeletes(int n)
{
_root = new UndeletedRange(0, n);
}
public void Delete(int i)
{
_root.Delete(i);
}
public int GetIndex(int indexRelativeToDeletes )
{
return _root.GetIndex(indexRelativeToDeletes);
}
public int Size()
{
return _root.Size();
}
}
static int CircleGame( int[] array, int k )
{
var ary_deletes = new ArrayDeletes(array.Length);
while (ary_deletes.Size() > 1)
{
int next_step = array[ary_deletes.GetIndex(k)];
ary_deletes.Delete(k);
k = (k + next_step - 1) % ary_deletes.Size();
}
return array[ary_deletes.GetIndex(0)];
}
static void Main(string[] args)
{
var array = new int[] { 5,4,3,2,1 };
int last_remaining = CircleGame(array, 2); // third element, this call is zero-based...
}
}
}
Also note that if the values in the array are known to be bounded such that they are always less than some m less than n, there are lots of O(nm) algorithms -- for example, just using a circular linked list.
I couldn't think of an O(n) solution. However, we could have O(n log n) average time by using a treap or an augmented BST with a value in each node for the size of its subtree. The treap enables us to find and remove the kth entry in O(log n) average time.
For example, A = [1, 2, 3, 4] and k = 3 (as Sumit reminded me in the comments, use the array indexes as values in the tree since those are ordered):
2(0.9)
/ \
1(0.81) 4(0.82)
/
3(0.76)
Find and remove 3rd element. Start at 2 with size = 2 (including the left subtree). Go right. Left subtree is size 1, which together makes 3, so we found the 3rd element. Remove:
2(0.9)
/ \
1(0.81) 4(0.82)
Now we're starting on the third element in an array with n - 1 = 3 elements and looking for the 3rd element from there. We'll use zero-indexing to correlate with our modular arithmetic, so the third element in modulus 3 would be 2 and 2 + 3 = 5 mod 3 = 2, the second element. We find it immediately since the root with its left subtree is size 2. Remove:
4(0.82)
/
1(0.81)
Now we're starting on the second element in modulus 2, so 1, and we're adding 2. 3 mod 2 is 1. Removing the first element we are left with 4 as the last element.

Changing numeric base for binary trees

I'm trying to figure an algorithm for changing the base of the numeric binary trees.
For example, lets say I have a set of trees in base of 8 and the main root is 4, and I need to scan the tree and change it to base of 5 (I can assume that the tree can be changed)
I can't seem to come up with the formula, I know I must do it using recursion, but as soon as I change 1 root, im loosing all connections for it's sons.
How can I handle this?
Here is pseudo code for doing it:-
void preorder(node* p) {
if(p==null)
return
p->data = change_base(p->data,oldbase,newbase);
preorder(p->left);
preorder(p->right);
}
int change_base(int k,int old,int new) {
base10 = 0;
while(k>0) {
base10 = base10*old + k%10;
k = k/10;
}
newbase = 0;
while(base10>0) {
r = base10%new;
newbase = newbase*10 + r;
base10 = base10/10;
}
return(newbase);
}

What is the best way to recursively generate all binary strings of length n?

I'm looking for a good (easy to implement, intuitive, etc.) recursive method of generating all binary strings of length n, where 1 <= n <= 35.
I would appreciate ideas for a pseudo-code algorithm (no language-specific tricks).
LE: okay, I did go overboard with the upper limit. My intention was to avoid solutions that use the binary representation of a counter from 1 to 1 << n.
Here's an example of recursion in C++.
vector<string> answer;
void getStrings( string s, int digitsLeft )
{
if( digitsLeft == 0 ) // the length of string is n
answer.push_back( s );
else
{
getStrings( s + "0", digitsLeft - 1 );
getStrings( s + "1", digitsLeft - 1 );
}
}
getStrings( "", n ); // initial call
According to the Divide et Impera paradigm, the problem of generating all binary strings of length n can be splitted in two subproblems: the problem of printing all binary strings of lenght n-1 preceeded by a 0 and the one of printing all binary strings of lenght n-1 preceeded by a 1. So the following pseudocode solves the problem:
generateBinary(length, string)
if(length > 0)
generateBinary(length-1, string + "0")
generateBinary(length-1, string + "1")
else
print(string)
def genBins(n):
"""
generate all the binary strings with n-length
"""
max_int = '0b' + '1' * n
for i in range(0, int(max_int, 2)+1, 1):
yield str(format(i, 'b').zfill(n))
if __name__ == '__main__':
print(list(genBins(5)))
The problem you have can be solved with a Backtracking algorithm.
Pseudo-code for such an algorithm is:
fun(input, n)
if( base_case(input, n) )
//print result
else
//choose from pool of choices
//explorer the rest of choices from what's left
//unchoose
Implementation:
Base case: we want to print our result string when its size is equal to n
Recursive case:
our pool of choices consists of 0 and 1
choosing in this case means take 0 or 1 and add it to the input as last character
explore by recursing, where we pass the new input value from the choose step until base case is reached
un-choosing in this case means remove the last character
function binary(n) {
binaryHelper('', n);
}
function binaryHelper(str, n) {
if (str.length === n) {
//base case
console.log(str); //print string
} else {
for (let bit = 0; bit < 2; bit++) {
str = str + bit; // choose
binaryHelper(str, n); // explore
str = str.slice(0, -1); // un-choose
}
}
}
console.log('Size 2 binary strings:');
binary(2);
console.log('Size 3 binary strings:');
binary(3);
You can re-write the code above like this, where you choose & un-choose by stateless transition from one loop iteration to another. This is less intuitive though.
function binary(n) {
binaryHelper('', n);
}
function binaryHelper(str, n) {
if(str.length === n) {
console.log(str);
} else {
for(let bit = 0; bit < 2; bit++) {
binaryHelper(str+bit, n);
}
}
}
console.log('Size 2 binary strings:');
binary(2);
console.log('Size 3 binary strings:');
binary(3);

find minimum step to make a number from a pair of number

Let's assume that we have a pair of numbers (a, b). We can get a new pair (a + b, b) or (a, a + b) from the given pair in a single step.
Let the initial pair of numbers be (1,1). Our task is to find number k, that is, the least number of steps needed to transform (1,1) into the pair where at least one number equals n.
I solved it by finding all the possible pairs and then return min steps in which the given number is formed, but it taking quite long time to compute.I guess this must be somehow related with finding gcd.can some one please help or provide me some link for the concept.
Here is the program that solved the issue but it is not cleat to me...
#include <iostream>
using namespace std;
#define INF 1000000000
int n,r=INF;
int f(int a,int b){
if(b<=0)return INF;
if(a>1&&b==1)return a-1;
return f(b,a-a/b*b)+a/b;
}
int main(){
cin>>n;
for(int i=1;i<=n/2;i++){
r=min(r,f(n,i));
}
cout<<(n==1?0:r)<<endl;
}
My approach to such problems(one I got from projecteuler.net) is to calculate the first few terms of the sequence and then search in oeis for a sequence with the same terms. This can result in a solutions order of magnitude faster. In your case the sequence is probably: http://oeis.org/A178031 but unfortunately it has no easy to use formula.
:
As the constraint for n is relatively small you can do a dp on the minimum number of steps required to get to the pair (a,b) from (1,1). You take a two dimensional array that stores the answer for a given pair and then you do a recursion with memoization:
int mem[5001][5001];
int solve(int a, int b) {
if (a == 0) {
return mem[a][b] = b + 1;
}
if (mem[a][b] != -1) {
return mem[a][b];
}
if (a == 1 && b == 1) {
return mem[a][b] = 0;
}
int res;
if (a > b) {
swap(a,b);
}
if (mem[a][b%a] == -1) { // not yet calculated
res = solve(a, b%a);
} else { // already calculated
res = mem[a][b%a];
}
res += b/a;
return mem[a][b] = res;
}
int main() {
memset(mem, -1, sizeof(mem));
int n;
cin >> n;
int best = -1;
for (int i = 1; i <= n; ++i) {
int temp = solve(n, i);
if (best == -1 || temp < best) {
best = temp;
}
}
cout << best << endl;
}
In fact in this case there is not much difference between dp and BFS, but this is the general approach to such problems. Hope this helps.
EDIT: return a big enough value in the dp if a is zero
You can use the breadth first search algorithm to do this. At each step you generate all possible NEXT steps that you havent seen before. If the set of next steps contains the result you're done if not repeat. The number of times you repeat this is the minimum number of transformations.
First of all, the maximum number you can get after k-3 steps is kth fibinocci number. Let t be the magic ratio.
Now, for n start with (n, upper(n/t) ).
If x>y:
NumSteps(x,y) = NumSteps(x-y,y)+1
Else:
NumSteps(x,y) = NumSteps(x,y-x)+1
Iteratively calculate NumSteps(n, upper(n/t) )
PS: Using upper(n/t) might not always provide the optimal solution. You can do some local search around this value for the optimal results. To ensure optimality you can try ALL the values from 0 to n-1, in which worst case complexity is O(n^2). But, if the optimal value results from a value close to upper(n/t), the solution is O(nlogn)

Selecting an optimum set according to ranked criteria

I am given a string, and a set of rules which select valid substrings by a process which isn't important here. Given an enumeration of all valid substrings, I have to find the optimum set of substrings according to a set of ranked criteria, such as:
Substrings may not overlap
All characters must be part of a substring if possible
Use as few different substrings as possible
etc.
For example, given the string abc and the substrings [a, ab, bc], the optimal set of substrings by the preceding rules is [a, bc].
Currently I'm doing this by a standard naive algorithm of enumerating all possible sets of substrings, then iterating over them to find the best candidate. The problem is that as the length of the string and the number of substrings goes up, the number of possible sets increases exponentially. With 50 substrings (well within possibility for this app), the number of sets to enumerate is 2^50, which is extremely prohibitive.
It seems like there should be a way to avoid generating many of the sets that will obviously be losers, or to algorithmically converge on the optimum set without having to blindly generate every candidate. What options are there?
Note that for this application it may be acceptable to use an algorithm that offers a statistical rather than absolute guarantee, such as an n% chance of hitting a non-optimal candidate, where n is suitably small.
Looks to me like a tree structure is needed.
Basically your initial branching is on all the substrings, then all but the one you used in the first round etc all the way to the bottom. You're right in that this branches to 2^50 but if you use ab-pruning to quickly terminate branches that are obviously inferior and then add some memoization to prune situations you've seen before you could speed up considerably.
You'll probably have to do a fair amount of AI learning to get it all but wikipedia pages on ab-pruning and transposition tables will get you a start.
edit:
Yep you're right, probably not clear enough.
Assuming your example "ABABABAB BABABABA" with substrings {"ABAB","BABA"}.
If you set your evaluation function to simply treat wasted characters as bad the tree will go something like this:
ABAB (eval=0)
ABAB (eval=0)
ABAB (eval=2 because we move past/waste a space char and a B)
[missing expansion]
BABA (eval=1 because we only waste the space)
ABAB (eval=2 now have wasted the space above and a B at this level)
BABA (eval=1 still only wasted the space)*
BABA (eval=1 prune here because we already have a result that is 1)
BABA (eval=1 prune here for same reason)
*best solution
I suspect the simple 'wasted chars' isn't enough in the non trivial example but it does prune half the tree here.
Here's a working solution in Haskell. I have called the unique substrings symbols, and an association of one occurrence of the substrings a placement. I have also interpreted criterion 3 ("Use as few different substrings as possible") as "use as few symbols as possible", as opposed to "use as few placements as possible".
This is a dynamic programming approach; the actual pruning occurs due to the memoization. Theoretically, a smart haskell implementation could do it for you, (but there are other ways where you wrap makeFindBest), I'd suggest using a bitfield to represent the used symbols and just an integer to represent the remaining string. The optimisation is possible from the fact that: given optimal solutions for the strings S1 and S2 that both use the same set of symbols, if S1 and S2 are concatenated then the two solutions can be concatenated in a similar manner and the new solution will be optimal. Hence for each partition of the input string, makeFindBest need only be evaluated once on the postfix for each possible set of symbols used in the prefix.
I've also integrated branch-and-bound pruning as suggested in Daniel's answer; this makes use of an evaluation function which becomes worse the more characters skipped. The cost is monotonic in the number of characters processed, so that if we have found a set of placements that wasted only alpha characters, then we never again try to skip more than alpha characters.
Where n is the string length and m is the number of symbols, the worst case is O(m^n) naively, and m is O(2^n). Note that removing constraint 3 would make things much quicker: the memoization would only need to be parameterized by the remaining string which is an O(n) cache, as opposed to O(n * 2^m)!
Using a string search/matching algorithm such as Aho-Corasick's string matching algorithm, improves the consume/drop 1 pattern I use here from exponential to quadratic. However, this by itself doesn't avoid the factorial growth in the combinations of the matches, which is where the dynamic programming helps.
Also note that your 4th "etc." criteria could possibly change the problem a lot if it constrains the problem in a way that makes it possible to do more aggressive pruning, or requires backtracking!
module Main where
import List
import Maybe
import System.Environment
type Symbol = String
type Placement = String
-- (remaining, placement or Nothing to skip one character)
type Move = (String, Maybe Placement)
-- (score, usedsymbols, placements)
type Solution = (Int, [Symbol], [Placement])
-- invoke like ./a.out STRING SPACE-SEPARATED-SYMBOLS ...
-- e.g. ./a.out "abcdeafghia" "a bc fg"
-- output is a list of placements
main = do
argv <- System.Environment.getArgs
let str = head argv
symbols = concat (map words (tail argv))
(putStr . show) $ findBest str symbols
putStr "\n"
getscore :: Solution -> Int
getscore (sc,_,_) = sc
-- | consume STR SYM consumes SYM from the start of STR. returns (s, SYM)
-- where s is the rest of STR, after the consumed occurrence, or Nothing if
-- SYM isnt a prefix of STR.
consume :: String -> Symbol -> Maybe Move
consume str sym = if sym `isPrefixOf` str
then (Just (drop (length sym) str, (Just sym)))
else Nothing
-- | addToSoln SYMBOLS P SOL incrementally updates SOL with the new SCORE and
-- placement P
addToSoln :: [Symbol] -> Maybe Placement -> Solution -> Solution
addToSoln symbols Nothing (sc, used, ps) = (sc - (length symbols) - 1, used, ps)
addToSoln symbols (Just p) (sc, used, ps) =
if p `elem` symbols
then (sc - 1, used `union` [p], p : ps)
else (sc, used, p : ps)
reduce :: [Symbol] -> Solution -> Solution -> [Move] -> Solution
reduce _ _ cutoff [] = cutoff
reduce symbols parent cutoff ((s,p):moves) =
let sol = makeFindBest symbols (addToSoln symbols p parent) cutoff s
best = if (getscore sol) > (getscore cutoff)
then sol
else cutoff
in reduce symbols parent best moves
-- | makeFindBest SYMBOLS PARENT CUTOFF STR searches for the best placements
-- that can be made on STR from SYMBOLS, that are strictly better than CUTOFF,
-- and prepends those placements to PARENTs third element.
makeFindBest :: [Symbol] -> Solution -> Solution -> String -> Solution
makeFindBest _ cutoff _ "" = cutoff
makeFindBest symbols parent cutoff str =
-- should be memoized by (snd parent) (i.e. the used symbols) and str
let moves = if (getscore parent) > (getscore cutoff)
then (mapMaybe (consume str) symbols) ++ [(drop 1 str, Nothing)]
else (mapMaybe (consume str) symbols)
in reduce symbols parent cutoff moves
-- a solution that makes no placements
worstScore str symbols = -(length str) * (1 + (length symbols))
findBest str symbols =
(\(_,_,ps) -> reverse ps)
(makeFindBest symbols (0, [], []) (worstScore str symbols, [], []) str)
This smells like a dynamic programming problem. You can find a number of good sources on it, but the gist is that you generate a collection of subproblems, and then build up "larger" optimal solutions by combining optimal subsolutions.
This is an answer rewritten to use the Aho-Corasick string-matching algorithm and Dijkstra's algorithm, in C++. This should be a lot closer to your target language of C#.
The Aho-Corasick step constructs an automaton (based on a suffix tree) from the set of patterns, and then uses that automaton to find all matches in the input string. Dijkstra's algorithm then treats those matches as nodes in a DAG, and moves toward the end of the string looking for the lowest cost path.
This approach is a lot easier to analyze, as it's simply combining two well-understood algorithms.
Constructing the Aho-Corasick automaton is linear time in the length of the patterns, and then the search is linear in the input string + the cumulative length of the matches.
Dijkstra's algorithm runs in O(|E| + |V| log |V|) assuming an efficient STL. The graph is a DAG, where vertices correspond to matches or to runs of characters that are skipped. Edge weights are the penalty for using an extra pattern or for skipping characters. An edge exists between two matches if they are adjacent and non-overlapping. An edge exists from a match m to a skip if that is the shortest possible skip between m and another match m2 that overlaps with some match m3 starting at the same place as the skip (phew!). The structure of Dijkstra's algorithm ensures that the optimal answer is the first one to be found by the time we reach the end of the input string (it achieves the pruning Daniel suggested implicitly).
#include <iostream>
#include <queue>
#include <vector>
#include <list>
#include <string>
#include <algorithm>
#include <set>
using namespace std;
static vector<string> patterns;
static string input;
static int skippenalty;
struct acnode {
acnode() : failure(NULL), gotofn(256) {}
struct acnode *failure;
vector<struct acnode *> gotofn;
list<int> outputs; // index into patterns global
};
void
add_string_to_trie(acnode *root, const string &s, int sid)
{
for (string::const_iterator p = s.begin(); p != s.end(); ++p) {
if (!root->gotofn[*p])
root->gotofn[*p] = new acnode;
root = root->gotofn[*p];
}
root->outputs.push_back(sid);
}
void
init_tree(acnode *root)
{
queue<acnode *> q;
unsigned char c = 0;
do {
if (acnode *u = root->gotofn[c]) {
u->failure = root;
q.push(u);
} else
root->gotofn[c] = root;
} while (++c);
while (!q.empty()) {
acnode *r = q.front();
q.pop();
do {
acnode *u, *v;
if (!(u = r->gotofn[c]))
continue;
q.push(u);
v = r->failure;
while (!v->gotofn[c])
v = v->failure;
u->failure = v->gotofn[c];
u->outputs.splice(u->outputs.begin(), v->gotofn[c]->outputs);
} while (++c);
}
}
struct match { int begin, end, sid; };
void
ahocorasick(const acnode *state, list<match> &out, const string &str)
{
int i = 1;
for (string::const_iterator p = str.begin(); p != str.end(); ++p, ++i) {
while (!state->gotofn[*p])
state = state->failure;
state = state->gotofn[*p];
for (list<int>::const_iterator q = state->outputs.begin();
q != state->outputs.end(); ++q) {
struct match m = { i - patterns[*q].size(), i, *q };
out.push_back(m);
}
}
}
////////////////////////////////////////////////////////////////////////
bool operator<(const match& m1, const match& m2)
{
return m1.begin < m2.begin
|| (m1.begin == m2.end && m1.end < m2.end);
}
struct dnode {
int usedchars;
vector<bool> usedpatterns;
int last;
};
bool operator<(const dnode& a, const dnode& b) {
return a.usedchars > b.usedchars
|| (a.usedchars == b.usedchars && a.usedpatterns < b.usedpatterns);
}
bool operator==(const dnode& a, const dnode& b) {
return a.usedchars == b.usedchars
&& a.usedpatterns == b.usedpatterns;
}
typedef priority_queue<pair<int, dnode>,
vector<pair<int, dnode> >,
greater<pair<int, dnode> > > mypq;
void
dijkstra(const vector<match> &matches)
{
typedef vector<match>::const_iterator mIt;
vector<bool> used(patterns.size(), false);
dnode initial = { 0, used, -1 };
mypq q;
set<dnode> last;
dnode d;
q.push(make_pair(0, initial));
while (!q.empty()) {
int cost = q.top().first;
d = q.top().second;
q.pop();
if (last.end() != last.find(d)) // we've been here before
continue;
last.insert(d);
if (d.usedchars >= input.size()) {
break; // found optimum
}
match m = { d.usedchars, 0, 0 };
mIt mp = lower_bound(matches.begin(), matches.end(), m);
if (matches.end() == mp) {
// no more matches, skip the remaining string
dnode nextd = d;
d.usedchars = input.size();
int skip = nextd.usedchars - d.usedchars;
nextd.last = -skip;
q.push(make_pair(cost + skip * skippenalty, nextd));
continue;
}
// keep track of where the shortest match ended; we don't need to
// skip more than this.
int skipmax = (mp->begin == d.usedchars) ? mp->end : mp->begin + 1;
while (mp != matches.end() && mp->begin == d.usedchars) {
dnode nextd = d;
nextd.usedchars = mp->end;
int extra = nextd.usedpatterns[mp->sid] ? 0 : 1; // extra pattern
int nextcost = cost + extra;
nextd.usedpatterns[mp->sid] = true;
nextd.last = mp->sid * 2 + extra; // encode used pattern
q.push(make_pair(nextcost, nextd));
++mp;
}
if (mp == matches.end() || skipmax <= mp->begin)
continue;
// skip
dnode nextd = d;
nextd.usedchars = mp->begin;
int skip = nextd.usedchars - d.usedchars;
nextd.last = -skip;
q.push(make_pair(cost + skip * skippenalty, nextd));
}
// unwind
string answer;
while (d.usedchars > 0) {
if (0 > d.last) {
answer = string(-d.last, '*') + answer;
d.usedchars += d.last;
} else {
answer = "[" + patterns[d.last / 2] + "]" + answer;
d.usedpatterns[d.last / 2] = !(d.last % 2);
d.usedchars -= patterns[d.last / 2].length();
}
set<dnode>::const_iterator lp = last.find(d);
if (last.end() == lp) return; // should not happen
d.last = lp->last;
}
cout << answer;
}
int
main()
{
int n;
cin >> n; // read n patterns
patterns.reserve(n);
acnode root;
for (int i = 0; i < n; ++i) {
string s;
cin >> s;
patterns.push_back(s);
add_string_to_trie(&root, s, i);
}
init_tree(&root);
getline(cin, input); // eat the rest of the first line
getline(cin, input);
cerr << "got input: " << input << endl;
list<match> matches;
ahocorasick(&root, matches, input);
vector<match> vmatches(matches.begin(), matches.end());
sort(vmatches.begin(), vmatches.end());
skippenalty = 1 + patterns.size();
dijkstra(vmatches);
return 0;
}
Here is a test file with 52 single-letter patterns (compile and then run with the test file on stdin):
52 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz

Resources