Convert string a to b using a dictionary of words - algorithm

You have a dictionary of words and two strings a and b.
How can one convert a to b by changing only one character at a time and making sure that all the intermediate words are in the dictionary?
Example:
dictionary: {"cat", "bat", "hat", "bad", "had"}
a = "bat"
b = "had"
solution:
"bat" -> "bad" -> "had"
EDIT: The solutions given below propose building a graph from the dictionary words such that every word will have an edge to all other words differing by just one character.
This may be somewhat difficult if the dictionary is too big (let us say we are not talking about english language words only).
Also, even if this is acceptable, what is the best algorithm to create such a graph? Finding edges from a word to all other words would be O(n) where n is dictionary size. And total graph construction would be O(n2)? Any better algorithm?
This is not homework problem but an interview question.

You can think of this as a graph search problem. Each word is a node in the graph, and there is an edge between two words if they differ by exactly one letter. Running a BFS over this graph will then find the shortest path between your start word and the destination word (if it's possible to turn one word into the other) and will report that there is no way to do this otherwise.

Simply do a BFS over the graph whose nodes are the words and there is an edge between two nodes iff the words on the nodes differ by one letter. In this way, you could provide a solution by starting BFS from the start word given. If you reach the destination node, then it's possible, otherwise not.
You could also provide the steps taken and note that you would be providing the least number of steps to derive the required as a bonus.
P.S.: It's a coincidence that this question was asked to me too in an interview and I coded this solution!

How can one convert a to b by changing only one character at a time
and making sure that all the intermediate words are in the dictionary?
This is straight O(nm)
where n is number of words in the dictionary
and m is number of characters in the input word
The algorithm is simple, if the word from the dictionary mismatch the input by 1-character, consider it a solution:
FOR EACH WORD W IN DICTIONARY DO
IF SIZE(W) = SIZE(INPUT) THEN
MIS = 0
FOR i: 1..SIZE(INPUT) IF W[i] != INPUT[i] THEN MIS = MIS + 1
IF MIS = 1 THEN SOLUTION.ADD(W)
END-IF
END-FOR

Pre-build and re-use a travel map.
For example, build a scity[][] with valid word distance, that can be re-used.
Just a quick-exercise for job hunting, might be simplified.
#define SLEN 10
char* dict[SLEN]={
"bat",
"hat",
"bad",
"had",
"mad",
"tad",
"het",
"hep",
"hady",
"bap"};
int minD=0xfffff;
int edst(char *a, char *b)
{
char *ip=a,*op=b;
int d=0;
while((*ip)&&(*op))
if(*ip++!=*op++)
{
if(d) return 0;
d++;
}
if((*op)||(*ip)) d++;
return d;
}
int strlen(char *a)
{
char *ip=a;
int i=0;
while(*ip++)
i++;
return i;
}
int valid(char *dict[], int a, int b)
{
if((a==b)||(strlen(dict[a])!=strlen(dict[b]))||(edst(dict[a],dict[b])!=1)) return 0;
return 1;
}
void sroute(int scity[SLEN][SLEN], char* dict[], int a[], int end, int pos)
{
int i,j,d=0;
if(a[pos]==end)
{
for(i=pos;i<(SLEN-1);i++)
{
printf("%s ",dict[a[i]]);
d+=scity[a[i]][a[i+1]];
}
printf(" %s=%d\n",dict[a[SLEN-1]],d);
if(d<minD) minD=d;
return;
}
for(i=pos-2;i>=0;i--)
{
int b[SLEN];
for(j=0;j<SLEN;j++) b[j]=a[j];
b[pos-1]=a[i];
b[i]=a[pos-1];
if(scity[b[pos-1]][b[pos]]==1)
sroute(scity,dict,b,end,pos-1);
}
if(scity[a[pos-1]][a[pos]]==1) sroute(scity,dict,a,end,pos-1);
}
void initS(int scity[SLEN][SLEN], char* dict[], int a, int b)
{
int i,j;
int c[SLEN];
for(i=0;i<SLEN;i++)
for(j=0;j<SLEN;j++)
scity[i][j]=valid(dict,i,j);
for(i=0;i<SLEN;i++) c[i]=i;
c[SLEN-1]=b;
c[b]=SLEN-1;
sroute(scity, dict, c, a, SLEN-1);
printf("min=%d\n",minD);
}

Related

Finding the longest sub-string with no repetition in a string. Time Complexity?

I recently interviewed with a company for software engineering position. I was asked the question of longest unique sub-string in a string. My algorithms was as follows -
Start from the left-most character, and keep storing the character in a hash table with the key as the character and the value as the index_where_it_last_occurred. Add the character to the answer string as long as its not present in the hash table. If we encounter a stored character again, I stop and note down the length. I empty the hash table and then start again from the right index of the repeated character. The right index is retrieved from the (index_where_it_last_occurred) flag. If I ever reach the end of the string, I stop and return the longest length.
For example, say the string was, abcdecfg.
I start with a, store in hash table. I store b and so on till e. Their indexes are stored as well. When I encounter c again, I stop since it's already hashed and note down the length which is 5. I empty the hash table, and start again from the right index of the repeated character. The repeated character being c, I start again from the position 3 ie., the character d. I keep doing this while I don't reach the end of string.
I am interested in knowing what the time complexity of this algorithm will be. IMO, it'll be O(n^2).
This is the code.
import java.util.*;
public class longest
{
static int longest_length = -1;
public static void main(String[] args)
{
Scanner in = new Scanner(System.in);
String str = in.nextLine();
calc(str,0);
System.out.println(longest_length);
}
public static void calc(String str,int index)
{
if(index >= str.length()) return;
int temp_length = 0;
LinkedHashMap<Character,Integer> map = new LinkedHashMap<Character,Integer>();
for (int i = index; i<str.length();i++)
{
if(!map.containsKey(str.charAt(i)))
{
map.put(str.charAt(i),i);
++temp_length;
}
else if(map.containsKey(str.charAt(i)))
{
if(longest_length < temp_length)
{
longest_length = temp_length;
}
int last_index = map.get(str.charAt(i));
// System.out.println(last_index);
calc(str,last_index+1);
break;
}
}
if(longest_length < temp_length)
longest_length = temp_length;
}
}
If the alphabet is of size K, then when you restart counting you jump back at most K-1 places, so you read each character of the string at most K times. So the algorithm is O(nK).
The input string which contains n/K copies of the alphabet exhibits this worst-case behavior. For example if the alphabet is {a, b, c}, strings of the form "abcabcabc...abc" have the property that nearly every character is read 3 times by your algorithm.
You can solve the original problem in O(K+n) time, using O(K) storage space by using dynamic programming.
Let the string be s, and we'll keep a number M which will be the the length of maximum unique_char string ending at i, P, which stores where each character was previously seen, and best, the longest unique-char string found so far.
Start:
Set P[c] = -1 for each c in the alphabet.
M = 0
best = 0
Then, for each i:
M = min(M+1, i-P[s[i]])
best = max(best, M)
P[s[i]] = i
This is trivially O(K) in storage, and O(K+n) in running time.

Given a dictionary, find all possible letter orderings

I was recently asked the following interview question:
You have a dictionary page written in an alien language. Assume that
the language is similar to English and is read/written from left to
right. Also, the words are arranged in lexicographic order. For
example the page could be: ADG, ADH, BCD, BCF, FM, FN
You have to give all lexicographic orderings possible of the character
set present in the page.
My approach is as follows:
A has higher precedence than B and G has higher precedence than H.
Therefore we have the information about ordering for some characters:
A->B, B->F, G->H, D->F, M->N
The possible orderings can be ABDFGNHMC, ACBDFGNHMC, ...
My approach was to use an array as position holder and generate all permutations to identify all valid orderings. The worst case time complexity for this is N! where N is the size of character set.
Can we do better than the brute force approach.
Thanks in advance.
Donald Knuth has written the paper A Structured Program to Generate all Topological Sorting Arrangements. This paper was originally pupblished in 1974. The following quote from the paper brought me to a better understanding of the problem (in the text the relation i < j stands for "i precedes j"):
A natural way to solve this problem is to let x1 be an
element having no predecessors, then to erase all relations of the
from x1 < j and to let x2 be an element ≠
x1 with no predecessors in the system as it now exists,
then to erase all relations of the from x2 < j , etc. It is
not difficult to verify that this method will always succeed unless
there is an oriented cycle in the input. Moreover, in a sense it is
the only way to proceed, since x1 must be an element
without predecessors, and x2 must be without predecessors
when all relations x1 < j are deleted, etc. This
observation leads naturally to an algorithm that finds all
solutions to the topological sorting problem; it is a typical example
of a "backtrack" procedure, where at every stage we consider a
subproblem of the from "Find all ways to complete a given partial
permutation x1x2...xk to a
topological sort x1x2...xn ." The
general method is to branch on all possible choices of
xk+1. A central problem in backtrack applications is
to find a suitable way to arrange the data so that it is easy to
sequence through the possible choices of xk+1 ; in this
case we need an efficient way to discover the set of all elements ≠
{x1,...,xk} which have no predecessors other
than x1,...,xk, and to maintain this knowledge
efficiently as we move from one subproblem to another.
The paper includes a pseudocode for a efficient algorithm. The time complexity for each output is O(m+n), where m ist the number of input relations and n is the number of letters. I have written a C++ program, that implements the algorithm described in the paper – maintaining variable and function names –, which takes the letters and relations from your question as input. I hope that nobody complains about giving the program to this answer – because of the language-agnostic tag.
#include <iostream>
#include <deque>
#include <vector>
#include <iterator>
#include <map>
// Define Input
static const char input[] =
{ 'A', 'D', 'G', 'H', 'B', 'C', 'F', 'M', 'N' };
static const char crel[][2] =
{{'A', 'B'}, {'B', 'F'}, {'G', 'H'}, {'D', 'F'}, {'M', 'N'}};
static const int n = sizeof(input) / sizeof(char);
static const int m = sizeof(crel) / sizeof(*crel);
std::map<char, int> count;
std::map<char, int> top;
std::map<int, char> suc;
std::map<int, int> next;
std::deque<char> D;
std::vector<char> buffer;
void alltopsorts(int k)
{
if (D.empty())
return;
char base = D.back();
do
{
char q = D.back();
D.pop_back();
buffer[k] = q;
if (k == (n - 1))
{
for (std::vector<char>::const_iterator cit = buffer.begin();
cit != buffer.end(); ++cit)
std::cout << (*cit);
std::cout << std::endl;
}
// erase relations beginning with q:
int p = top[q];
while (p >= 0)
{
char j = suc[p];
count[j]--;
if (!count[j])
D.push_back(j);
p = next[p];
}
alltopsorts(k + 1);
// retrieve relations beginning with q:
p = top[q];
while (p >= 0)
{
char j = suc[p];
if (!count[j])
D.pop_back();
count[j]++;
p = next[p];
}
D.push_front(q);
}
while (D.back() != base);
}
int main()
{
// Prepare
std::fill_n(std::back_inserter(buffer), n, 0);
for (int i = 0; i < n; i++) {
count[input[i]] = 0;
top[input[i]] = -1;
}
for (int i = 0; i < m; i++) {
suc[i] = crel[i][1]; next[i] = top[crel[i][0]];
top[crel[i][0]] = i; count[crel[i][1]]++;
}
for (std::map<char, int>::const_iterator cit = count.begin();
cit != count.end(); ++cit)
if (!(*cit).second)
D.push_back((*cit).first);
alltopsorts(0);
}
There is no algorithm that can do better than O(N!) if there are N! answers. But I think there is a better way to understand the problem:
You can build a directed graph in this way: if A appears before B, then there is an edge from A to B. After building the graph, you just need to find all possible topological sort results. Still O(N!), but easier to code and better than your approach (don't have to generate invalid ordering).
I would solve it like this:
Look at first letter: (A -> B -> F)
Look at second letter, but only account those who have same first letter: (D), (C), (M -> N)
Look at third letter, but only account those who have same 1. and 2. letter: (G -> H), (D -> F)
And so on, while it is something remaining... (Look at Nth letter, group by the previous letters)
What is in parentheses is all the information you get from set (all the possible orderings). Ignore parentheses with only one letter, because they do not represent ordering. Then take everthing in parentheses and topologically sort.
ok, i admit straight away that i don't have an estimate of time complexity for the average case, but maybe the following two observations will help.
first, this is an obvious candidate for a constraint library. if you were doing this in practice (like, it was some task at work) then you would get a constraint solver, give it the various pair-wise orderings you have, and then ask for a list of all results.
second, that is typically implemented as a search. if you have N characters consider a tree whose root node has N children (selection of the first character); next node has N-1 children (selection of second character); etc. clearly this is N! worst case for full exploration.
even with a "dumb" search, you can see that you can often prune searches by checking your order at any point against the pairs that you have.
but since you know that a total ordering exists, even though you (may) only have partial information, you can make the search more efficient. for example, you know that the first character must not appear to the "right" of < for any pair (if we assume that each character is given a numerical value, with the first character being lowest). similarly, moving down the tree, for the appropriately reduced data.
in short, you can enumerate possible solutions by exploring a tree, using the incomplete ordering information to constrain possible choices at each node.
hope that helps some.

Find all chordless cycles in an undirected graph

How to find all chordless cycles in an undirected graph?
For example, given the graph
0 --- 1
| | \
| | \
4 --- 3 - 2
the algorithm should return 1-2-3 and 0-1-3-4, but never 0-1-2-3-4.
(Note: [1] This question is not the same as small cycle finding in a planar graph because the graph is not necessarily planar. [2] I have read the paper Generating all cycles, chordless cycles, and Hamiltonian cycles with the principle of exclusion but I don't understand what they're doing :). [3] I have tried CYPATH but the program only gives the count, algorithm EnumChordlessPath in readme.txt has significant typos, and the C code is a mess. [4] I am not trying to find an arbitrary set of fundametal cycles. Cycle basis can have chords.)
Assign numbers to nodes from 1 to n.
Pick the node number 1. Call it 'A'.
Enumerate pairs of links coming out of 'A'.
Pick one. Let's call the adjacent nodes 'B' and 'C' with B less than C.
If B and C are connected, then output the cycle ABC, return to step 3 and pick a different pair.
If B and C are not connected:
Enumerate all nodes connected to B. Suppose it's connected to D, E, and F. Create a list of vectors CABD, CABE, CABF. For each of these:
if the last node is connected to any internal node except C and B, discard the vector
if the last node is connected to C, output and discard
if it's not connected to either, create a new list of vectors, appending all nodes to which the last node is connected.
Repeat until you run out of vectors.
Repeat steps 3-5 with all pairs.
Remove node 1 and all links that lead to it. Pick the next node and go back to step 2.
Edit: and you can do away with one nested loop.
This seems to work at the first sight, there may be bugs, but you should get the idea:
void chordless_cycles(int* adjacency, int dim)
{
for(int i=0; i<dim-2; i++)
{
for(int j=i+1; j<dim-1; j++)
{
if(!adjacency[i+j*dim])
continue;
list<vector<int> > candidates;
for(int k=j+1; k<dim; k++)
{
if(!adjacency[i+k*dim])
continue;
if(adjacency[j+k*dim])
{
cout << i+1 << " " << j+1 << " " << k+1 << endl;
continue;
}
vector<int> v;
v.resize(3);
v[0]=j;
v[1]=i;
v[2]=k;
candidates.push_back(v);
}
while(!candidates.empty())
{
vector<int> v = candidates.front();
candidates.pop_front();
int k = v.back();
for(int m=i+1; m<dim; m++)
{
if(find(v.begin(), v.end(), m) != v.end())
continue;
if(!adjacency[m+k*dim])
continue;
bool chord = false;
int n;
for(n=1; n<v.size()-1; n++)
if(adjacency[m+v[n]*dim])
chord = true;
if(chord)
continue;
if(adjacency[m+j*dim])
{
for(n=0; n<v.size(); n++)
cout<<v[n]+1<<" ";
cout<<m+1<<endl;
continue;
}
vector<int> w = v;
w.push_back(m);
candidates.push_back(w);
}
}
}
}
}
#aioobe has a point. Just find all the cycles and then exclude the non-chordless ones. This may be too inefficient, but the search space can be pruned along the way to reduce the inefficiencies. Here is a general algorithm:
void printChordlessCycles( ChordlessCycle path) {
System.out.println( path.toString() );
for( Node n : path.lastNode().neighbors() ) {
if( path.canAdd( n) ) {
path.add( n);
printChordlessCycles( path);
path.remove( n);
}
}
}
Graph g = loadGraph(...);
ChordlessCycle p = new ChordlessCycle();
for( Node n : g.getNodes()) {
p.add(n);
printChordlessCycles( p);
p.remove( n);
}
class ChordlessCycle {
private CountedSet<Node> connected_nodes;
private List<Node> path;
...
public void add( Node n) {
for( Node neighbor : n.getNeighbors() ) {
connected_nodes.increment( neighbor);
}
path.add( n);
}
public void remove( Node n) {
for( Node neighbor : n.getNeighbors() ) {
connected_nodes.decrement( neighbor);
}
path.remove( n);
}
public boolean canAdd( Node n) {
return (connected_nodes.getCount( n) == 0);
}
}
Just a thought:
Let's say you are enumerating cycles on your example graph and you are starting from node 0.
If you do a breadth-first search for each given edge, e.g. 0 - 1, you reach a fork at 1. Then the cycles that reach 0 again first are chordless, and the rest are not and can be eliminated... at least I think this is the case.
Could you use an approach like this? Or is there a counterexample?
How about this. First, reduce the problem to finding all chordless cycles that pass through a given vertex A. Once you've found all of those, you can remove A from the graph, and repeat with another point until there's nothing left.
And how to find all the chordless cycles that pass through vertex A? Reduce this to finding all chordless paths from B to A, given a list of permitted vertices, and search either breadth-first or depth-first. Note that when iterating over the vertices reachable (in one step) from B, when you choose one of them you must remove all of the others from the list of permitted vertices (take special care when B=A, so as not to eliminate three-edge paths).
Find all cycles.
Definition of a chordless cycle is a set of points in which a subset cycle of those points don't exist. So, once you have all cycles problem is simply to eliminate cycles which do have a subset cycle.
For efficiency, for each cycle you find, loop through all existing cycles and verify that it is not a subset of another cycle or vice versa, and if so, eliminate the larger cycle.
Beyond that, only difficulty is figuring out how to write an algorithm that determines if a set is a subset of another.

String Tiling Algorithm

I'm looking for an efficient algorithm to do string tiling. Basically, you are given a list of strings, say BCD, CDE, ABC, A, and the resulting tiled string should be ABCDE, because BCD aligns with CDE yielding BCDE, which is then aligned with ABC yielding the final ABCDE.
Currently, I'm using a slightly naïve algorithm, that works as follows. Starting with a random pair of strings, say BCD and CDE, I use the following (in Java):
public static String tile(String first, String second) {
for (int i = 0; i < first.length() || i < second.length(); i++) {
// "right" tile (e.g., "BCD" and "CDE")
String firstTile = first.substring(i);
// "left" tile (e.g., "CDE" and "BCD")
String secondTile = second.substring(i);
if (second.contains(firstTile)) {
return first.substring(0, i) + second;
} else if (first.contains(secondTile)) {
return second.substring(0, i) + first;
}
}
return EMPTY;
}
System.out.println(tile("CDE", "ABCDEF")); // ABCDEF
System.out.println(tile("BCD", "CDE")); // BCDE
System.out.println(tile("CDE", "ABC")); // ABCDE
System.out.println(tile("ABC", tile("BCX", "XYZ"))); // ABCXYZ
Although this works, it's not very efficient, as it iterates over the same characters over and over again.
So, does anybody know a better (more efficient) algorithm to do this ? This problem is similar to a DNA sequence alignment problem, so any advice from someone in this field (and others, of course) are very much welcome. Also note that I'm not looking for an alignment, but a tiling, because I require a full overlap of one of the strings over the other.
I'm currently looking for an adaptation of the Rabin-Karp algorithm, in order to improve the asymptotic complexity of the algorithm, but I'd like to hear some advice before delving any further into this matter.
Thanks in advance.
For situations where there is ambiguity -- e.g., {ABC, CBA} which could result in ABCBA or CBABC --, any tiling can be returned. However, this situation seldom occurs, because I'm tiling words, e.g. {This is, is me} => {This is me}, which are manipulated so that the aforementioned algorithm works.
Similar question: Efficient Algorithm for String Concatenation with Overlap
Order the strings by the first character, then length (smallest to largest), and then apply the adaptation to KMP found in this question about concatenating overlapping strings.
I think this should work for the tiling of two strings, and be more efficient than your current implementation using substring and contains. Conceptually I loop across the characters in the 'left' string and compare them to a character in the 'right' string. If the two characters match, I move to the next character in the right string. Depending on which string the end is first reached of, and if the last compared characters match or not, one of the possible tiling cases is identified.
I haven't thought of anything to improve the time complexity of tiling more than two strings. As a small note for multiple strings, this algorithm below is easily extended to checking the tiling of a single 'left' string with multiple 'right' strings at once, which might prevent extra looping over the strings a bit if you're trying to find out whether to do ("ABC", "BCX", "XYZ") or ("ABC", "XYZ", BCX") by just trying all the possibilities. A bit.
string Tile(string a, string b)
{
// Try both orderings of a and b,
// since TileLeftToRight is not commutative.
string ab = TileLeftToRight(a, b);
if (ab != "")
return ab;
return TileLeftToRight(b, a);
// Alternatively you could return whichever
// of the two results is longest, for cases
// like ("ABC" "BCABC").
}
string TileLeftToRight(string left, string right)
{
int i = 0;
int j = 0;
while (true)
{
if (left[i] != right[j])
{
i++;
if (i >= left.Length)
return "";
}
else
{
i++;
j++;
if (i >= left.Length)
return left + right.Substring(j);
if (j >= right.Length)
return left;
}
}
}
If Open Source code is acceptable, then you should check the genome benchmarks in Stanford's STAMP benchmark suite: it does pretty much exactly what you're looking for. Starting with a bunch of strings ("genes"), it looks for the shortest string that incorporates all the genes. So for example if you have ATGC and GCAA, it'll find ATGCAA. There's nothing about the algorithm that limits it to a 4-character alphabet, so this should be able to help you.
The first thing to ask is if you want to find the tilling of {CDB, CDA}? There is no single tilling.
Interesting problem. You need some kind of backtracking. For example if you have:
ABC, BCD, DBC
Combining DBC with BCD results in:
ABC, DBCD
Which is not solvable. But combining ABC with BCD results in:
ABCD, DBC
Which can be combined to:
ABCDBC.

Linear Time Voting Algorithm. I don't get it

As I was reading this (Find the most common entry in an array), the Boyer and Moore's Linear Time Voting Algorithm was suggested.
If you follow the link to the site, there is a step by step explanation of how the algorithm works. For the given sequence, AAACCBBCCCBCC it presents the right solution.
When we move the pointer forward over
an element e:
If the counter is 0, we set the current candidate to e and we set the
counter to 1.
If the counter is not 0, we increment or decrement the counter
according to whether e is the current
candidate.
When we are done, the current
candidate is the majority element, if
there is a majority.
If I use this algorithm on a piece of paper with AAACCBB as input, the suggested candidate would become B what is obviously wrong.
As I see it, there are two possibilities
The authors have never tried their algorithm on anything else than AAACCBBCCCBCC, are completely incompetent and should be fired on the spot (doubtfull).
I am clearly missing something, must get banned from Stackoverflow and never be allowed again to touch anything involving logic.
Note: Here is a a C++ implementation of the algorithm from Niek Sanders. I believe he correctly implemented the idea and as such it has the same problem (or doesn't it?).
The algorithm only works when the set has a majority -- more than half of the elements being the same. AAACCBB in your example has no such majority. The most frequent letter occurs 3 times, the string length is 7.
Small but an important addition to the other explanations. Moore's Voting algorithm has 2 parts -
first part of running Moore's Voting algorithm only gives you a candidate for the majority element. Notice the word "candidate" here.
In the second part, we need to iterate over the array once again to determine if this candidate occurs maximum number of times (i.e. greater than size/2 times).
First iteration is to find the candidate & second iteration is to check if this element occurs majority of times in the given array.
So time complexity is: O(n) + O(n) ≈ O(n)
From the first linked SO question:
with the property that more than half of the entries in the array are equal to N
From the Boyer and Moore page:
which element of a sequence is in the majority, provided there is such an element
Both of these algorithms explicitly assume that one element occurs at least N/2 times. (Note in particular that "majority" is not the same as "most common.")
I wrote a C++ code for this algorithm
char find_more_than_half_shown_number(char* arr, int len){
int i=0;
std::vector<int> vec;
while(i<len){
if(vec.empty()){
vec.push_back(arr[i]);
vec.push_back(1);
}else if(vec[0]==arr[i]){
vec[1]++;
}else if(vec[0]!=arr[i]&&vec[1]!=0){
vec[1]--;
}else{
vec[0]=arr[i];
}
i++;
}
int tmp_count=0;
for(int i=0;i<len;i++){
if(arr[i]==vec[0])
tmp_count++;
}
if(tmp_count>=(len+1)/2)
return vec[0];
else
return -1;
}
and the main function is as below:
int main(int argc, const char * argv[])
{
char arr[]={'A','A','A','C','C','B','B','C','C','C','B','C','C'};
int len=sizeof(arr)/sizeof(char);
char rest_num=find_more_than_half_shown_number(arr,len);
std::cout << "rest_num="<<rest_num<<std::endl;
return 0;
}
When the test case is "AAACCBB", the set has no majority. Because no element occurs more than 3 times since the length of "AAACCBB" is 7.
Here's the code for "the Boyer and Moore's Linear Time Voting Algorithm":
int Voting(vector<int> &num) {
int count = 0;
int candidate;
for(int i = 0; i < num.size(); ++i) {
if(count == 0) {
candidate = num[i];
count = 1;
}
else
count = (candidate == num[i]) ? ++count : --count;
}
return candidate;
}

Resources