As I was reading this (Find the most common entry in an array), the Boyer and Moore's Linear Time Voting Algorithm was suggested.
If you follow the link to the site, there is a step by step explanation of how the algorithm works. For the given sequence, AAACCBBCCCBCC it presents the right solution.
When we move the pointer forward over
an element e:
If the counter is 0, we set the current candidate to e and we set the
counter to 1.
If the counter is not 0, we increment or decrement the counter
according to whether e is the current
candidate.
When we are done, the current
candidate is the majority element, if
there is a majority.
If I use this algorithm on a piece of paper with AAACCBB as input, the suggested candidate would become B what is obviously wrong.
As I see it, there are two possibilities
The authors have never tried their algorithm on anything else than AAACCBBCCCBCC, are completely incompetent and should be fired on the spot (doubtfull).
I am clearly missing something, must get banned from Stackoverflow and never be allowed again to touch anything involving logic.
Note: Here is a a C++ implementation of the algorithm from Niek Sanders. I believe he correctly implemented the idea and as such it has the same problem (or doesn't it?).
The algorithm only works when the set has a majority -- more than half of the elements being the same. AAACCBB in your example has no such majority. The most frequent letter occurs 3 times, the string length is 7.
Small but an important addition to the other explanations. Moore's Voting algorithm has 2 parts -
first part of running Moore's Voting algorithm only gives you a candidate for the majority element. Notice the word "candidate" here.
In the second part, we need to iterate over the array once again to determine if this candidate occurs maximum number of times (i.e. greater than size/2 times).
First iteration is to find the candidate & second iteration is to check if this element occurs majority of times in the given array.
So time complexity is: O(n) + O(n) ≈ O(n)
From the first linked SO question:
with the property that more than half of the entries in the array are equal to N
From the Boyer and Moore page:
which element of a sequence is in the majority, provided there is such an element
Both of these algorithms explicitly assume that one element occurs at least N/2 times. (Note in particular that "majority" is not the same as "most common.")
I wrote a C++ code for this algorithm
char find_more_than_half_shown_number(char* arr, int len){
int i=0;
std::vector<int> vec;
while(i<len){
if(vec.empty()){
vec.push_back(arr[i]);
vec.push_back(1);
}else if(vec[0]==arr[i]){
vec[1]++;
}else if(vec[0]!=arr[i]&&vec[1]!=0){
vec[1]--;
}else{
vec[0]=arr[i];
}
i++;
}
int tmp_count=0;
for(int i=0;i<len;i++){
if(arr[i]==vec[0])
tmp_count++;
}
if(tmp_count>=(len+1)/2)
return vec[0];
else
return -1;
}
and the main function is as below:
int main(int argc, const char * argv[])
{
char arr[]={'A','A','A','C','C','B','B','C','C','C','B','C','C'};
int len=sizeof(arr)/sizeof(char);
char rest_num=find_more_than_half_shown_number(arr,len);
std::cout << "rest_num="<<rest_num<<std::endl;
return 0;
}
When the test case is "AAACCBB", the set has no majority. Because no element occurs more than 3 times since the length of "AAACCBB" is 7.
Here's the code for "the Boyer and Moore's Linear Time Voting Algorithm":
int Voting(vector<int> &num) {
int count = 0;
int candidate;
for(int i = 0; i < num.size(); ++i) {
if(count == 0) {
candidate = num[i];
count = 1;
}
else
count = (candidate == num[i]) ? ++count : --count;
}
return candidate;
}
Related
I was looking at the following question from Glassdoor:
Given N credits cards, determine if more than half of them belong to the same person/owner. All you have is an array of the credit card numbers, and an api call like isSamePerson(num1, num2).
It is clear how to do it in O(n^2) but some commenters said it can be done in O(n) time. Is it even possible? I mean, if we have an array of credit card numbers where some numbers are repeated, then the claim makes sense. However, we need to make an API call for each credit card number to see its owner.
What am I missing here?
The algorithm goes as follows:
If there is a majority of one item (here, a person), then if you pair together items that are not equal (in any order), this item will be left over.
Start with an empty candidate slot
For every item
If the candidate slot is empty (count = 0), place it there.
Else if it is equal to the item in the slot, increment its count.
Else decrement the count for that slot(pop one item).
If there is nothing left on the candidate slot, there is no clear majority. Otherwise,
Count the number of occurences of the candidate (second pass).
If the number of occurences is more than 50%, declare it a winner,
Else there is no majority.
Note this cannot be applied if the threshold is below 50% (but it should be possible to adapt to a threshold of 33%, 25%... by holding two, three... candidate slots and popping only a distinct triple, quadruple...).
This also apllies to the case of the credit cards: All you need to is compare two elements (persons) for equality (via the API call), and a counter that is able to accomodate the total number of elements.
Time complexity: O(N)
Space complexity: O(1) + input
API calls: up to 2N-1: once in each pass, no api call for the first element in the first pass.
Let x1,x2,...,xn be the credit card numbers.
Note that since more than half of them belong to same person, if you consider two adjacent numbers, at least one pair of them are going to belong to same person.
If you consider all pairs (x1,x2), (x3,x4)...., and consider the subset of pairs where both elements belong to same person, a majority of same-person-pairs belong to the person who has majority of cards in first place. So, for every same-person-pair keep one of the card numbers and for non-same-person-pairs discard both. Do this recursively and return the last remaining same-person-pair.
You need to perform at most n comparisons.
NOTE: If n is odd keep the unpaired number.
Why this works: consider a case where n is even and person A owns n/2 + 1 cards. In the worst case you have exactly one pair where both cards are owned by A. In that case none of the other pairs are owned by same person ( other pairs contain one card of A and a card by other person).
Now, to create one matching pair of B (non-A person), you have to create one pair of B also. Therefore, at every instance a majority of matching pairs are owned by A.
I don't have reputation to comment. The method told by Jan Dvorak is known as Moore's Voting Algorithm (Stream Counting Algorithm). Here's the code.
int majorityElement(int* nums, int numsSize) {
int count =0, i, majorityElement;
/*Moore's voting algorithm Order(n), Auxiliary space: Order(1)*/
/*step:1-To get candidate for the majority element*/
for(i=0; i < numsSize; i++)
{
if(count==0)
majorityElement = nums[i];
if(nums[i]==majorityElement)
count++;
else
count--;
}
/*Step:2- To check whether this candidate occurs max no. of times */
count =0;
for(i=0; i<numsSize; i++)
{
if(nums[i]==majorityElement)
count ++;
}
if(count>numsSize/2) /*It can only be applied for majority check */
return majorityElement;
return -1;}
The question is to find out the majority element in an array. I shall use
Boyer-Moore Majority Vote algorithm. I am doing this using HashMap.
public class majorityElement1 {
public static void main(String[] args) {
int a[] = {2,2,2,2,5,5,2,3,3,3,3,3,3,33,3};
fix(a);
}
public static void fix(int[] a ) {
Map<Integer,Integer> map = new HashMap<>();
for(int i = 0 ; i<a.length ; i++) {
int r = a[i];
if(!map.containsKey(r)) {
map.put(r, 1);
}else {
if(map.get(r) +1 >= a.length/2) {
System.out.println("majority element => "+ r);
return ;
}else {
map.put(r,map.get(r) +1);
}
}//else1
}//for
}
}
The output is 3.
DONE IN ONE PASS :
Start from the the second index of array let say i=1 initially.
Initially count=1.
Call isSamePerson(a[i],a[i-1]) where array a[] contains credit card numbers.
If the returned value is positive , do count++ and i++
else if returned value is 0 and count==1 , i++
else if returned value is 0 and count>1 , do count-- and i++
If i!=(n-1) , go to step 3 where n is number of cards.
else If at the end of array count>1 , then there are more than half of cards belonging to a single person
else there is no clear majority of over 50%.
I hope that this is understandable and writing code would be an easy thing.
TIME COMPLEXITY - O(N)
NUMBER OF API CALLS = N-1
SPACE COMPLEXITY - O(1)
For finding the position of a fraction in farey sequence, i tried to implement the algorithm given here http://www.math.harvard.edu/~corina/publications/farey.pdf under "initial algorithm" but i can't understand where i'm going wrong, i am not getting the correct answers . Could someone please point out my mistake.
eg. for order n = 7 and fractions 1/7 ,1/6 i get same answers.
Here's what i've tried for given degree(n), and a fraction a/b:
sum=0;
int A[100000];
A[1]=a;
for(i=2;i<=n;i++)
A[i]=i*a-a;
for(i=2;i<=n;i++)
{
for(j=i+i;j<=n;j+=i)
A[j]-=A[i];
}
for(i=1;i<=n;i++)
sum+=A[i];
ans = sum/b;
Thanks.
Your algorithm doesn't use any particular properties of a and b. In the first part, every relevant entry of the array A is a multiple of a, but the factor is independent of a, b and n. Setting up the array ignoring the factor a, i.e. starting with A[1] = 1, A[i] = i-1 for 2 <= i <= n, after the nested loops, the array contains the totients, i.e. A[i] = phi(i), no matter what a, b, n are. The sum of the totients from 1 to n is the number of elements of the Farey sequence of order n (plus or minus 1, depending on which of 0/1 and 1/1 are included in the definition you use). So your answer is always the approximation (a*number of terms)/b, which is close but not exact.
I've not yet looked at how yours relates to the algorithm in the paper, check back for updates later.
Addendum: Finally had time to look at the paper. Your initialisation is not what they give. In their algorithm, A[q] is initialised to floor(x*q), for a rational x = a/b, the correct initialisation is
for(i = 1; i <= n; ++i){
A[i] = (a*i)/b;
}
in the remainder of your code, only ans = sum/b; has to be changed to ans = sum;.
A non-algorithmic way of finding the position t of a fraction in the Farey sequence of order n>1 is shown in Remark 7.10(ii)(a) of the paper, under m:=n-1, where mu-bar stands for the number-theoretic Mobius function on positive integers taking values from the set {-1,0,1}.
Here's my Java solution that works. Add head(0/1), tail(1/1) nodes to a SLL.
Then start by passing headNode,tailNode and setting required orderLevel.
public void generateSequence(Node leftNode, Node rightNode){
Fraction left = (Fraction) leftNode.getData();
Fraction right= (Fraction) rightNode.getData();
FractionNode midNode = null;
int midNum = left.getNum()+ right.getNum();
int midDenom = left.getDenom()+ right.getDenom();
if((midDenom <=getMaxLevel())){
Fraction middle = new Fraction(midNum,midDenom);
midNode = new FractionNode(middle);
}
if(midNode!= null){
leftNode.setNext(midNode);
midNode.setNext(rightNode);
generateSequence(leftNode, midNode);
count++;
}else if(rightNode.next()!=null){
generateSequence(rightNode, rightNode.next());
}
}
I have a problem which is a pretty clear instance of the subset sum problem:
"given a list of Integers in the range [-65000,65000], the function returns true if any subset of the list summed is equal to zero. False otherwise."
What I wanted to ask is more of an explanation than a solution.
This was an instance-specific solution I came up before thinking about the complexity of the problem.
Sort the array A[] and, during sort, sum each element to a counter 'extSum' (O(NLogN))
Define to pointers low = A[0] and high = A[n-1]
Here is the deciding code:
while(A[low]<0){
sum = extSum;
if(extSum>0){
while(sum - A[high] < sum){
tmp = sum - A[high];
if(tmp==0) return true;
else if(tmp > 0){
sum = tmp;
high--;
}
else{
high--;
}
}
extSum -= A[low];
low++;
high = n - 1;
}
else{
/* Symmetric code: switch low, high and the operation > and < */
}
}
return false;
First of all, is this solution correct? I made some tests, but I am not sure...it looks too easy...
Isn't the time complexity of this code O(n^2)?
I already read the various DP solutions and the thing I would like to understand is, for the specific instance of the problem I am facing, how much better than this naive and intuitive solution they are. I know my approach can be improved a lot but nothing that would make a big difference when it comes to the time complexity....
Thank you for the clarifications
EDIT: One obvious optimization would be that, while sorting, if a 0 is found, the function returns true immediately....but it's only for the specific case in which there are 0s in the array.
Hmm, I think {0} will beat your answer.
Because it will simply ignore while and return false.
I was recently asked the following interview question:
You have a dictionary page written in an alien language. Assume that
the language is similar to English and is read/written from left to
right. Also, the words are arranged in lexicographic order. For
example the page could be: ADG, ADH, BCD, BCF, FM, FN
You have to give all lexicographic orderings possible of the character
set present in the page.
My approach is as follows:
A has higher precedence than B and G has higher precedence than H.
Therefore we have the information about ordering for some characters:
A->B, B->F, G->H, D->F, M->N
The possible orderings can be ABDFGNHMC, ACBDFGNHMC, ...
My approach was to use an array as position holder and generate all permutations to identify all valid orderings. The worst case time complexity for this is N! where N is the size of character set.
Can we do better than the brute force approach.
Thanks in advance.
Donald Knuth has written the paper A Structured Program to Generate all Topological Sorting Arrangements. This paper was originally pupblished in 1974. The following quote from the paper brought me to a better understanding of the problem (in the text the relation i < j stands for "i precedes j"):
A natural way to solve this problem is to let x1 be an
element having no predecessors, then to erase all relations of the
from x1 < j and to let x2 be an element ≠
x1 with no predecessors in the system as it now exists,
then to erase all relations of the from x2 < j , etc. It is
not difficult to verify that this method will always succeed unless
there is an oriented cycle in the input. Moreover, in a sense it is
the only way to proceed, since x1 must be an element
without predecessors, and x2 must be without predecessors
when all relations x1 < j are deleted, etc. This
observation leads naturally to an algorithm that finds all
solutions to the topological sorting problem; it is a typical example
of a "backtrack" procedure, where at every stage we consider a
subproblem of the from "Find all ways to complete a given partial
permutation x1x2...xk to a
topological sort x1x2...xn ." The
general method is to branch on all possible choices of
xk+1. A central problem in backtrack applications is
to find a suitable way to arrange the data so that it is easy to
sequence through the possible choices of xk+1 ; in this
case we need an efficient way to discover the set of all elements ≠
{x1,...,xk} which have no predecessors other
than x1,...,xk, and to maintain this knowledge
efficiently as we move from one subproblem to another.
The paper includes a pseudocode for a efficient algorithm. The time complexity for each output is O(m+n), where m ist the number of input relations and n is the number of letters. I have written a C++ program, that implements the algorithm described in the paper – maintaining variable and function names –, which takes the letters and relations from your question as input. I hope that nobody complains about giving the program to this answer – because of the language-agnostic tag.
#include <iostream>
#include <deque>
#include <vector>
#include <iterator>
#include <map>
// Define Input
static const char input[] =
{ 'A', 'D', 'G', 'H', 'B', 'C', 'F', 'M', 'N' };
static const char crel[][2] =
{{'A', 'B'}, {'B', 'F'}, {'G', 'H'}, {'D', 'F'}, {'M', 'N'}};
static const int n = sizeof(input) / sizeof(char);
static const int m = sizeof(crel) / sizeof(*crel);
std::map<char, int> count;
std::map<char, int> top;
std::map<int, char> suc;
std::map<int, int> next;
std::deque<char> D;
std::vector<char> buffer;
void alltopsorts(int k)
{
if (D.empty())
return;
char base = D.back();
do
{
char q = D.back();
D.pop_back();
buffer[k] = q;
if (k == (n - 1))
{
for (std::vector<char>::const_iterator cit = buffer.begin();
cit != buffer.end(); ++cit)
std::cout << (*cit);
std::cout << std::endl;
}
// erase relations beginning with q:
int p = top[q];
while (p >= 0)
{
char j = suc[p];
count[j]--;
if (!count[j])
D.push_back(j);
p = next[p];
}
alltopsorts(k + 1);
// retrieve relations beginning with q:
p = top[q];
while (p >= 0)
{
char j = suc[p];
if (!count[j])
D.pop_back();
count[j]++;
p = next[p];
}
D.push_front(q);
}
while (D.back() != base);
}
int main()
{
// Prepare
std::fill_n(std::back_inserter(buffer), n, 0);
for (int i = 0; i < n; i++) {
count[input[i]] = 0;
top[input[i]] = -1;
}
for (int i = 0; i < m; i++) {
suc[i] = crel[i][1]; next[i] = top[crel[i][0]];
top[crel[i][0]] = i; count[crel[i][1]]++;
}
for (std::map<char, int>::const_iterator cit = count.begin();
cit != count.end(); ++cit)
if (!(*cit).second)
D.push_back((*cit).first);
alltopsorts(0);
}
There is no algorithm that can do better than O(N!) if there are N! answers. But I think there is a better way to understand the problem:
You can build a directed graph in this way: if A appears before B, then there is an edge from A to B. After building the graph, you just need to find all possible topological sort results. Still O(N!), but easier to code and better than your approach (don't have to generate invalid ordering).
I would solve it like this:
Look at first letter: (A -> B -> F)
Look at second letter, but only account those who have same first letter: (D), (C), (M -> N)
Look at third letter, but only account those who have same 1. and 2. letter: (G -> H), (D -> F)
And so on, while it is something remaining... (Look at Nth letter, group by the previous letters)
What is in parentheses is all the information you get from set (all the possible orderings). Ignore parentheses with only one letter, because they do not represent ordering. Then take everthing in parentheses and topologically sort.
ok, i admit straight away that i don't have an estimate of time complexity for the average case, but maybe the following two observations will help.
first, this is an obvious candidate for a constraint library. if you were doing this in practice (like, it was some task at work) then you would get a constraint solver, give it the various pair-wise orderings you have, and then ask for a list of all results.
second, that is typically implemented as a search. if you have N characters consider a tree whose root node has N children (selection of the first character); next node has N-1 children (selection of second character); etc. clearly this is N! worst case for full exploration.
even with a "dumb" search, you can see that you can often prune searches by checking your order at any point against the pairs that you have.
but since you know that a total ordering exists, even though you (may) only have partial information, you can make the search more efficient. for example, you know that the first character must not appear to the "right" of < for any pair (if we assume that each character is given a numerical value, with the first character being lowest). similarly, moving down the tree, for the appropriately reduced data.
in short, you can enumerate possible solutions by exploring a tree, using the incomplete ordering information to constrain possible choices at each node.
hope that helps some.
I have an assignment to create an algorithm to find duplicates in an array which includes number values. but it has not said which kind of numbers, integers or floats. I have written the following pseudocode:
FindingDuplicateAlgorithm(A) // A is the array
mergeSort(A);
for int i <- 0 to i<A.length
if A[i] == A[i+1]
i++
return A[i]
else
i++
have I created an efficient algorithm?
I think there is a problem in my algorithm, it returns duplicate numbers several time. for example if array include 2 in two for two indexes i will have ...2, 2,... in the output. how can i change it to return each duplicat only one time?
I think it is a good algorithm for integers, but does it work good for float numbers too?
To handle duplicates, you can do the following:
if A[i] == A[i+1]:
result.append(A[i]) # collect found duplicates in a list
while A[i] == A[i+1]: # skip the entire range of duplicates
i++ # until a new value is found
Do you want to find Duplicates in Java?
You may use a HashSet.
HashSet h = new HashSet();
for(Object a:A){
boolean b = h.add(a);
boolean duplicate = !b;
if(duplicate)
// do something with a;
}
The return-Value of add() is defined as:
true if the set did not already
contain the specified element.
EDIT:
I know HashSet is optimized for inserts and contains operations. But I'm not sure if its fast enough for your concerns.
EDIT2:
I've seen you recently added the homework-tag. I would not prefer my answer if itf homework, because it may be to "high-level" for an allgorithm-lesson
http://download.oracle.com/javase/1.4.2/docs/api/java/util/HashSet.html#add%28java.lang.Object%29
Your answer seems pretty good. First sorting and them simply checking neighboring values gives you O(n log(n)) complexity which is quite efficient.
Merge sort is O(n log(n)) while checking neighboring values is simply O(n).
One thing though (as mentioned in one of the comments) you are going to get a stack overflow (lol) with your pseudocode. The inner loop should be (in Java):
for (int i = 0; i < array.length - 1; i++) {
...
}
Then also, if you actually want to display which numbers (and or indexes) are the duplicates, you will need to store them in a separate list.
I'm not sure what language you need to write the algorithm in, but there are some really good C++ solutions in response to my question here. Should be of use to you.
O(n) algorithm: traverse the array and try to input each element in a hashtable/set with number as the hash key. if you cannot enter, than that's a duplicate.
Your algorithm contains a buffer overrun. i starts with 0, so I assume the indexes into array A are zero-based, i.e. the first element is A[0], the last is A[A.length-1]. Now i counts up to A.length-1, and in the loop body accesses A[i+1], which is out of the array for the last iteration. Or, simply put: If you're comparing each element with the next element, you can only do length-1 comparisons.
If you only want to report duplicates once, I'd use a bool variable firstDuplicate, that's set to false when you find a duplicate and true when the number is different from the next. Then you'd only report the first duplicate by only reporting the duplicate numbers if firstDuplicate is true.
public void printDuplicates(int[] inputArray) {
if (inputArray == null) {
throw new IllegalArgumentException("Input array can not be null");
}
int length = inputArray.length;
if (length == 1) {
System.out.print(inputArray[0] + " ");
return;
}
for (int i = 0; i < length; i++) {
if (inputArray[Math.abs(inputArray[i])] >= 0) {
inputArray[Math.abs(inputArray[i])] = -inputArray[Math.abs(inputArray[i])];
} else {
System.out.print(Math.abs(inputArray[i]) + " ");
}
}
}