Generate all combinations of arbitrary alphabet up to arbitrary length - algorithm

Say I have an array of arbitrary size holding single characters. I want to compute all possible combinations of those characters up to an arbitrary length.
So lets say my array is [1, 2, 3]. The user-specified length is 2. Then the possible combinations are [11, 22, 33, 12, 13, 23, 21, 31, 32].
I'm having real trouble finding a suitable algorithm that allows arbitrary lengths and not just permutates the array. Oh and while speed is not absolutely critical, it should be reasonably fast too.

Just do an add with carry.
Say your array contained 4 symbols and you want ones of length 3.
Start with 000 (i.e. each symbol on your word = alphabet[0])
Then add up:
000
001
002
003
010
011
...
The algorithm (given these indices) is just to increase the lowest number. If it reaches the number of symbols in your alphabet, increase the previous number (following the same rule) and set the current to 0.
C++ code:
int N_LETTERS = 4;
char alphabet[] = {'a', 'b', 'c', 'd'};
std::vector<std::string> get_all_words(int length)
{
std::vector<int> index(length, 0);
std::vector<std::string> words;
while(true)
{
std::string word(length);
for (int i = 0; i < length; ++i)
word[i] = alphabet[index[i]];
words.push_back(word);
for (int i = length-1; ; --i)
{
if (i < 0) return words;
index[i]++;
if (index[i] == N_LETTERS)
index[i] = 0;
else
break;
}
}
}
Code is untested, but should do the trick.

Knuth covers combinations and permutations in some depth in The Art of Computer Programming, vol 1. Here is an implementation of one of his algorithms I wrote some years ago (don't hate on the style, its ancient code):
#include <algorithm>
#include <vector>
#include <functional>
#include <iostream>
using namespace std;
template<class BidirectionalIterator, class Function, class Size>
Function _permute(BidirectionalIterator first, BidirectionalIterator last, Size k, Function f, Size n, Size level)
{
// This algorithm is adapted from Donald Knuth,
// "The Art of Computer Programming, vol. 1, p. 45, Method 1"
// Thanks, Donald.
for( Size x = 0; x < (n-level); ++x ) // rotate every possible value in to this level's slot
{
if( (level+1) < k )
// if not at max level, recurse down to twirl higher levels first
f = _permute(first,last,k,f,n,level+1);
else
{
// we are at highest level, this is a unique permutation
BidirectionalIterator permEnd = first;
advance(permEnd, k);
f(first,permEnd);
}
// rotate next element in to this level's position & continue
BidirectionalIterator rotbegin(first);
advance(rotbegin,level);
BidirectionalIterator rotmid(rotbegin);
rotmid++;
rotate(rotbegin,rotmid,last);
}
return f;
}
template<class BidirectionalIterator, class Function, class Size>
Function for_each_permutation(BidirectionalIterator first, BidirectionalIterator last, Size k, Function fn)
{
return _permute<BidirectionalIterator,Function,Size>(first, last, k, fn, distance(first,last), 0);
}
template<class Elem>
struct DumpPermutation : public std::binary_function<bool, Elem* , Elem*>
{
bool operator()(Elem* begin, Elem* end) const
{
cout << "[";
copy(begin, end, ostream_iterator<Elem>(cout, " "));
cout << "]" << endl;
return true;
}
};
int main()
{
int ary[] = {1, 2, 3};
const size_t arySize = sizeof(ary)/sizeof(ary[0]);
for_each_permutation(&ary[0], &ary[arySize], 2, DumpPermutation<int>());
return 0;
}
Output of this program is:
[1 2 ]
[1 3 ]
[2 3 ]
[2 1 ]
[3 1 ]
[3 2 ]
If you want your combinations to include repeated elements like [11] [22] and [33], you can generate your list of combinations using the algorithm above, and then append to the generated list new elements, by doing something like this:
for( size_t i = 0; i < arySize; ++i )
{
cout << "[";
for( int j = 0; j < k; ++j )
cout << ary[i] << " ";
cout << "]" << endl;
}
...and the program output now becomes:
[1 2 ]
[1 3 ]
[2 3 ]
[2 1 ]
[3 1 ]
[3 2 ]
[1 1 ]
[2 2 ]
[3 3 ]

One way to do it would be with a simple counter that you internally interpret as base N, where N is the number of items in the array. You then extract each digit from the base N counter and use it as an index into your array. So if your array is [1,2] and the user specified length is 2, you have
Counter = 0, indexes are 0, 0
Counter = 1, indexes are 0, 1
Counter = 2, indexes are 1, 0
Counter = 3, indexes are 1, 1
The trick here will be your base-10 to base-N conversion code, which isn't terribly difficult.

If you know the length before hand, all you need is some for loops. Say, for length = 3:
for ( i = 0; i < N; i++ )
for ( j = 0; j < N; j++ )
for ( k = 0; k < N; k++ )
you now have ( i, j, k ), or a_i, a_j, a_k
Now to generalize it, just do it recursively, each step of the recursion with one of the for loops:
recurse( int[] a, int[] result, int index)
if ( index == N ) base case, process result
else
for ( i = 0; i < N; i++ ) {
result[index] = a[i]
recurse( a, result, index + 1 )
}
Of course, if you simply want all combinations, you can just think of each step as an N-based number, from 1 to k^N - 1, where k is the length.
Basically you would get, in base N (for k = 4):
0000 // take the first element four times
0001 // take the first element three times, then the second element
0002
...
000(N-1) // take the first element three times, then take the N-th element
1000 // take the second element, then the first element three times
1001
..
(N-1)(N-1)(N-1)(N-1) // take the last element four times

Using Peter's algorithm works great; however, if your letter set is too large or your string size too long, attempting to put all of the permutations in an array and returning the array won't work. The size of the array will be the size of the alphabet raised to the length of the string.
I created this in perl to take care of the problem:
package Combiner;
#package used to grab all possible combinations of a set of letters. Gets one every call, allowing reduced memory usage and faster processing.
use strict;
use warnings;
#initiate to use nextWord
#arguments are an array reference for the list of letters and the number of characters to be in the generated strings.
sub new {
my ($class, $phoneList,$length) = #_;
my $self = bless {
phoneList => $phoneList,
length => $length,
N_LETTERS => scalar #$phoneList,
}, $class;
$self->init;
$self;
}
sub init {
my ($self) = shift;
$self->{lindex} = [(0) x $self->{length}];
$self->{end} = 0;
$self;
}
#returns all possible combinations of N phonemes, one at a time.
sub nextWord {
my $self = shift;
return 0 if $self->{end} == 1;
my $word = [('-') x $self->{length}];
$$word[$_] = ${$self->{phoneList}}[${$self->{lindex}}[$_]]
for(0..$self->{length}-1);
#treat the string like addition; loop through 000, 001, 002, 010, 020, etc.
for(my $i = $self->{length}-1;;$i--){
if($i < 0){
$self->{end} = 1;
return $word;
}
${$self->{lindex}}[$i]++;
if (${$self->{lindex}}[$i] == $self->{N_LETTERS}){
${$self->{lindex}}[$i] = 0;
}
else{
return $word;
}
}
}
Call it like this: my $c = Combiner->new(['a','b','c','d'],20);. Then call nextWord to grab the next word; if nextWord returns 0, it means it's done.

Here's my implementation in Haskell:
g :: [a] -> [[a]] -> [[a]]
g alphabet = concat . map (\xs -> [ xs ++ [s] | s <- alphabet])
allwords :: [a] -> [[a]]
allwords alphabet = concat $ iterate (g alphabet) [[]]
Load this script into GHCi. Suppose that we want to find all strings of length less than or equal to 2 over the alphabet {'a','b','c'}. The following GHCi session does that:
*Main> take 13 $ allwords ['a','b','c']
["","a","b","c","aa","ab","ac","ba","bb","bc","ca","cb","cc"]
Or, if you want just the strings of length equal to 2:
*Main> filter (\xs -> length xs == 2) $ take 13 $ allwords ['a','b','c']
["aa","ab","ac","ba","bb","bc","ca","cb","cc"]
Be careful with allwords ['a','b','c'] for it is an infinite list!

This is written by me. may be helpful for u...
#include<stdio.h>
#include <unistd.h>
void main()
{
FILE *file;
int i=0,f,l1,l2,l3=0;
char set[]="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890!##$%&*.!##$%^&*()";
int size=sizeof(set)-1;
char per[]="000";
//check urs all entered details here//
printf("Setlength=%d Comination are genrating\n",size);
// writing permutation here for length of 3//
for(l1=0;l1<size;l1++)
//first for loop which control left most char printed in file//
{
per[0]=set[l1];
// second for loop which control all intermediate char printed in file//
for(l2=0;l2<size;l2++)
{
per[1]=set[l2];
//third for loop which control right most char printed in file//
for(l3=0;l3<size;l3++)
{
per[2]=set[l3];
//apend file (add text to a file or create a file if it does not exist.//
file = fopen("file.txt","a+");
//writes array per to file named file.txt//
fprintf(file,"%s\n",per);
///Writing to file is completed//
fclose(file);
i++;
printf("Genrating Combination %d\r",i);
fflush(stdout);``
usleep(1);
}
}
}
printf("\n%d combination has been genrate out of entered data of length %d \n",i,size);
puts("No combination is left :) ");
puts("Press any butoon to exit");
getchar();
}

Related

Find all anagrams in a string O(n) solution

Here is the problem:
Given a string s and a non-empty string p, find all the start indices of p's anagrams in s.
Input: s: "cbaebabacd" p: "abc"
Output: [0, 6]
Input: s: "abab" p: "ab"
Output: [0, 1, 2]
Here is my solution
vector<int> findAnagrams(string s, string p) {
vector<int> res, s_map(26,0), p_map(26,0);
int s_len = s.size();
int p_len = p.size();
if (s_len < p_len) return res;
for (int i = 0; i < p_len; i++) {
++s_map[s[i] - 'a'];
++p_map[p[i] - 'a'];
}
if (s_map == p_map)
res.push_back(0);
for (int i = p_len; i < s_len; i++) {
++s_map[s[i] - 'a'];
--s_map[s[i - p_len] - 'a'];
if (s_map == p_map)
res.push_back(i - p_len + 1);
}
return res;
}
However, I think it is O(n^2) solution because I have to compare vectors s_map and p_map.
Does a O(n) solution exist for this problem?
lets say p has size n.
lets say you have an array A of size 26 that is filled with the number of a,b,c,... which p contains.
then you create a new array B of size 26 filled with 0.
lets call the given (big) string s.
first of all you initialize B with the number of a,b,c,... in the first n chars of s.
then you iterate through each word of size n in s always updating B to fit this n-sized word.
always B matches A you will have an index where we have an anagram.
to change B from one n-sized word to another, notice you just have to remove in B the first char of the previous word and add the new char of the next word.
Look at the example:
Input
s: "cbaebabacd"
p: "abc" n = 3 (size of p)
A = {1, 1, 1, 0, 0, 0, ... } // p contains just 1a, 1b and 1c.
B = {1, 1, 1, 0, 0, 0, ... } // initially, the first n-sized word contains this.
compare(A,B)
for i = n; i < size of s; i++ {
B[ s[i-n] ]--;
B[ s[ i ] ]++;
compare(A,B)
}
and suppose that compare(A,B) prints the index always A matches B.
the total complexity will be:
first fill of A = O(size of p)
first fill of B = O(size of s)
first comparison = O(26)
for-loop = |s| * (2 + O(26)) = |s| * O(28) = O(28|s|) = O(size of s)
____________________________________________________________________
2 * O(size of s) + O(size of p) + O(26)
which is linear in size of s.
Your solution is the O(n) solution. The size of the s_map and p_map vectors is a constant (26) that doesn't depend on n. So the comparison between s_map and p_map takes a constant amount of time regardless of how big n is.
Your solution takes about 26 * n integer comparisons to complete, which is O(n).
// In papers on string searching algorithms, the alphabet is often
// called Sigma, and it is often not considered a constant. Your
// algorthm works in (Sigma * n) time, where n is the length of the
// longer string. Below is an algorithm that works in O(n) time even
// when Sigma is too large to make an array of size Sigma, as long as
// values from Sigma are a constant number of "machine words".
// This solution works in O(n) time "with high probability", meaning
// that for all c > 2 the probability that the algorithm takes more
// than c*n time is 1-o(n^-c). This is a looser bound than O(n)
// worst-cast because it uses hash tables, which depend on randomness.
#include <functional>
#include <iostream>
#include <type_traits>
#include <vector>
#include <unordered_map>
#include <vector>
using namespace std;
// Finding a needle in a haystack. This works for any iterable type
// whose members can be stored as keys of an unordered_map.
template <typename T>
vector<size_t> AnagramLocations(const T& needle, const T& haystack) {
// Think of a contiguous region of an ordered container as
// representing a function f with the domain being the type of item
// stored in the container and the codomain being the natural
// numbers. We say that f(x) = n when there are n x's in the
// contiguous region.
//
// Then two contiguous regions are anagrams when they have the same
// function. We can track how close they are to being anagrams by
// subtracting one function from the other, pointwise. When that
// difference is uniformly 0, then the regions are anagrams.
unordered_map<remove_const_t<remove_reference_t<decltype(*needle.begin())>>,
intmax_t> difference;
// As we iterate through the haystack, we track the lead (part
// closest to the end) and lag (part closest to the beginning) of a
// contiguous region in the haystack. When we move the region
// forward by one, one part of the function f is increased by +1 and
// one part is decreased by -1, so the same is true of difference.
auto lag = haystack.begin(), lead = haystack.begin();
// To compare difference to the uniformly-zero function in O(1)
// time, we make sure it does not contain any points that map to
// 0. The the property of being uniformly zero is the same as the
// property of having an empty difference.
const auto find = [&](const auto& x) {
difference[x]++;
if (0 == difference[x]) difference.erase(x);
};
const auto lose = [&](const auto& x) {
difference[x]--;
if (0 == difference[x]) difference.erase(x);
};
vector<size_t> result;
// First we initialize the difference with the first needle.size()
// items from both needle and haystack.
for (const auto& x : needle) {
lose(x);
find(*lead);
++lead;
if (lead == haystack.end()) return result;
}
size_t i = 0;
if (difference.empty()) result.push_back(i++);
// Now we iterate through the haystack with lead, lag, and i (the
// position of lag) updating difference in O(1) time at each spot.
for (; lead != haystack.end(); ++lead, ++lag, ++i) {
find(*lead);
lose(*lag);
if (difference.empty()) result.push_back(i);
}
return result;
}
int main() {
string needle, haystack;
cin >> needle >> haystack;
const auto result = AnagramLocations(needle, haystack);
for (auto x : result) cout << x << ' ';
}
import java.util.*;
public class FindAllAnagramsInAString_438{
public static void main(String[] args){
String s="abab";
String p="ab";
// String s="cbaebabacd";
// String p="abc";
System.out.println(findAnagrams(s,p));
}
public static List<Integer> findAnagrams(String s, String p) {
int i=0;
int j=p.length();
List<Integer> list=new ArrayList<>();
while(j<=s.length()){
//System.out.println("Substring >>"+s.substring(i,j));
if(isAnamgram(s.substring(i,j),p)){
list.add(i);
}
i++;
j++;
}
return list;
}
public static boolean isAnamgram(String s,String p){
HashMap<Character,Integer> map=new HashMap<>();
if(s.length()!=p.length()) return false;
for(int i=0;i<s.length();i++){
char chs=s.charAt(i);
char chp=p.charAt(i);
map.put(chs,map.getOrDefault(chs,0)+1);
map.put(chp,map.getOrDefault(chp,0)-1);
}
for(int val:map.values()){
if(val!=0) return false;
}
return true;
}
}

Efficiently count occurrences of each element from given ranges

So i have some ranges like these:
2 4
1 9
4 5
4 7
For this the result should be
1 -> 1
2 -> 2
3 -> 2
4 -> 4
5 -> 3
6 -> 2
7 -> 2
8 -> 1
9 -> 1
The naive approach will be to loop through all the ranges but that would be very inefficient and the worst case would take O(n * n)
What would be the efficient approach probably in O(n) or O(log(n))
Here's the solution, in O(n):
The rationale is to add a range [a, b] as a +1 in a, and a -1 after b. Then, after adding all the ranges, then compute the accumulated sums for that array and display it.
If you need to perform queries while adding the values, a better choice would be to use a Binary Indexed Tree, but your question doesn't seem to require this, so I left it out.
#include <iostream>
#define MAX 1000
using namespace std;
int T[MAX];
int main() {
int a, b;
int min_index = 0x1f1f1f1f, max_index = 0;
while(cin >> a >> b) {
T[a] += 1;
T[b+1] -= 1;
min_index = min(min_index, a);
max_index = max(max_index, b);
}
for(int i=min_index; i<=max_index; i++) {
T[i] += T[i-1];
cout << i << " -> " << T[i] << endl;
}
}
UPDATE: Based on the "provocations" (in a good sense) by גלעד ברקן, you can also do this in O(n log n):
#include <iostream>
#include <map>
#define ull unsigned long long
#define miit map<ull, int>::iterator
using namespace std;
map<ull, int> T;
int main() {
ull a, b;
while(cin >> a >> b) {
T[a] += 1;
T[b+1] -= 1;
}
ull last;
int count = 0;
for(miit it = T.begin(); it != T.end(); it++) {
if (count > 0)
for(ull i=last; i<it->first; i++)
cout << i << " " << count << endl;
count += it->second;
last = it->first;
}
}
The advantage of this solution is being able to support ranges with much larger values (as long as the output isn't so large).
The solution would be pretty simple:
generate two lists with the indices of all starting and ending indices of the ranges and sort them.
Generate a counter for the number of ranges that cover the current index. Start at the first item that is at any range and iterate over all numbers to the last element that is in any range. Now if an index is either part of the list of starting-indices, we add 1 to the counter, if it's an element of the ending-indices, we substract 1 from the counter.
Implementation:
vector<int> count(int** ranges , int rangecount , int rangemin , int rangemax)
{
vector<int> res;
set<int> open, close;
for(int** r = ranges ; r < ranges + sizeof(int*) * rangecount ; r++)
{
open.add((*r)[0]);
close.add((*r)[1]);
}
int rc = 0;
for(int i = rangemin ; i < rangemax ; i++)
{
if(open.count(i))
++rc;
res.add(rc);
if(close.count(i))
--rc;
}
return res;
}
Paul's answer still counts from "the first item that is at any range and iterate[s] over all numbers to the last element that is in any range." But what is we could aggregate overlapping counts? For example, if we have three (or say a very large number of) overlapping ranges [(2,6),[1,6],[2,8] the section (2,6) could be dependent only on the number of ranges, if we were to label the overlaps with their counts [(1),3(2,6),(7,8)]).
Using binary search (once for the start and a second time for the end of each interval), we could split the intervals and aggregate the counts in O(n * log m * l) time, where n is our number of given ranges and m is the number of resulting groups in the total range and l varies as the number of disjoint updates required for a particular overlap (the number of groups already within that range). Notice that at any time, we simply have a sorted list grouped as intervals with labeled count.
2 4
1 9
4 5
4 7
=>
(2,4)
(1),2(2,4),(5,9)
(1),2(2,3),3(4),2(5),(6,9)
(1),2(2,3),4(4),3(5),2(6,7),(8,9)
So you want the output to be an array, where the value of each element is the number of input ranges that include it?
Yeah, the obvious solution would be to increment every element in the range by 1, for each range.
I think you can get more efficient if you sort the input ranges by start (primary), end (secondary). So for 32bit start and end, start:end can be a 64bit sort key. Actually, just sorting by start is fine, we need to sort the ends differently anyway.
Then you can see how many ranges you enter for an element, and (with a pqueue of range-ends) see how many you already left.
# pseudo-code with possible bugs.
# TODO: peek or put-back the element from ranges / ends
# that made the condition false.
pqueue ends; // priority queue
int depth = 0; // how many ranges contain this element
for i in output.len {
while (r = ranges.next && r.start <= i) {
ends.push(r.end);
depth++;
}
while (ends.pop < i) {
depth--;
}
output[i] = depth;
}
assert ends.empty();
Actually, we can just sort the starts and ends separately into two separate priority queues. There's no need to build the pqueue on the fly. (Sorting an array of integers is more efficient than sorting an array of structs by one struct member, because you don't have to copy around as much data.)

Filter only digit sequences containing a given set of digits

I have a large list of digit strings like this one. The individual strings are relatively short (say less than 50 digits).
data = [
'300303334',
'53210234',
'123456789',
'5374576807063874'
]
I need to find out a efficient data structure (speed first, memory second) and algorithm which returns only those strings that are composed of a given set of digits.
Example results:
filter(data, [0,3,4]) = ['300303334']
filter(data, [0,1,2,3,4,5]) = ['300303334', '53210234']
The data list will usually fit into memory.
For each digit, precompute a postings list that don't contain the digit.
postings = [[] for _ in xrange(10)]
for i, d in enumerate(data):
for j in xrange(10):
digit = str(j)
if digit not in d:
postings[j].append(i)
Now, to find all strings that contain, for example, just the digits [1, 3, 5] you can merge the postings lists for the other digits (ie: 0, 2, 4, 6, 7, 8, 9).
def intersect_postings(p0, p1):
i0, i1 = next(p0), next(p1)
while True:
if i0 == i1:
yield i0
i0, i1 = next(p0), next(p1)
elif i0 < i1: i0 = next(p0)
else: i1 = next(p1)
def find_all(digits):
p = None
for d in xrange(10):
if d not in digits:
if p is None: p = iter(postings[d])
else: p = intersect_postings(p, iter(postings[d]))
return (data[i] for i in p) if p else iter(data)
print list(find_all([0, 3, 4]))
print list(find_all([0, 1, 2, 3, 4, 5]))
A string can be encoded by a 10-bit number. There are 2^10, or 1,024 possible values.
So create a dictionary that uses an integer for a key and a list of strings for the value.
Calculate the value for each string and add that string to the list of strings for that value.
General idea:
Dictionary Lookup;
for each (string in list)
value = 0;
for each character in string
set bit N in value, where N is the character (0-9)
Lookup[value] += string // adds string to list for this value in dictionary
Then, to get a list of the strings that match your criteria, just compute the value and do a direct dictionary lookup.
So if the user asks for strings that contain only 3, 5, and 7:
value = (1 << 3) || (1 << 5) || (1 << 7);
list = Lookup[value];
Note that, as Matt pointed out in comment below, this will only return strings that contain all three digits. So, for example, it wouldn't return 37. That seems like a fatal flaw to me.
Edit
If the number of symbols you have to deal with is very large, then the number of possible combinations becomes too large for this solution to be practical.
With a large number of symbols, I'd recommend an inverted index as suggested in the comments, combined with a secondary filter that removes the strings that contain extraneous digits.
Consider a function f which constructs a bitmask for each string with bit i set if digit i is in the string.
For example,
f('0') = 0b0000000001
f('00') = 0b0000000001
f('1') = 0b0000000010
f('1100') = 0b0000000011
Then I suggest storing a list of strings for each bitmask.
For example,
Bitmask 0b0000000001 -> ['0','00']
Once you have prepared this data structure (which is the same size as your original list), you can then easily access all the strings for a particular filter by accessing all lists where the bitmask is a subset of the digits in your filter.
So for your example of filter [0,3,4] you would return the lists from:
Strings containing just 0
Strings containing just 3
Strings containing just 4
Strings containing 0 and 3
Strings containing 0 and 4
Strings containing 3 and 4
Strings containing 0 and 3 and 4
Example Python Code
from collections import defaultdict
import itertools
raw_data = [
'300303334',
'53210234',
'123456789',
'5374576807063874'
]
def preprocess(raw_data):
data = defaultdict(list)
for s in raw_data:
bitmask = 0
for digit in s:
bitmask |= 1<<int(digit)
data[bitmask].append(s)
return data
def filter(data,mask):
for r in range(len(mask)):
for m in itertools.combinations(mask,r+1):
bitmask = sum(1<<digit for digit in m)
for s in data[bitmask]:
yield s
data = preprocess(raw_data)
for a in filter(data, [0,1,2,3,4,5]):
print a
Just for kicks, I have coded up Jim's lovely algorithm and the Perl is here if anyone wants to play with it. Please do not accept this as an answer or anything, pass all credit to Jim:
#!/usr/bin/perl
use strict;
use warnings;
my $Debug=1;
my $Nwords=1000;
my ($word,$N,$value,$i,$j,$k);
my (#dictionary,%Lookup);
################################################################################
# Generate "words" with random number of characters 5-30
################################################################################
print "DEBUG: Generating $Nwords word dictionary\n" if $Debug;
for($i=0;$i<$Nwords;$i++){
$j = rand(25) + 5; # length of this word
$word="";
for($k=0;$k<$j;$k++){
$word = $word . int(rand(10));
}
$dictionary[$i]=$word;
print "$word\n" if $Debug;
}
# Add some obvious test cases
$dictionary[++$i]="0" x 50;
$dictionary[++$i]="1" x 50;
$dictionary[++$i]="2" x 50;
$dictionary[++$i]="3" x 50;
$dictionary[++$i]="4" x 50;
$dictionary[++$i]="5" x 50;
$dictionary[++$i]="6" x 50;
$dictionary[++$i]="7" x 50;
$dictionary[++$i]="8" x 50;
$dictionary[++$i]="9" x 50;
$dictionary[++$i]="0123456789";
################################################################################
# Encode words
################################################################################
for $word (#dictionary){
$value=0;
for($i=0;$i<length($word);$i++){
$N=substr($word,$i,1);
$value |= 1 << $N;
}
push(#{$Lookup{$value}},$word);
print "DEBUG: $word encoded as $value\n" if $Debug;
}
################################################################################
# Do lookups
################################################################################
while(1){
print "Enter permitted digits, separated with commas: ";
my $line=<STDIN>;
my #digits=split(",",$line);
$value=0;
for my $d (#digits){
$value |= 1<<$d;
}
print "Value: $value\n";
print join(", ",#{$Lookup{$value}}),"\n\n" if defined $Lookup{$value};
}
I like Jim Mischel's approach. It has pretty efficient look up and bounded memory usage. Code in C follows:
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <readline/readline.h>
#include <readline/history.h>
enum {
zero = '0',
nine = '9',
numbers = nine - zero + 1,
masks = 1 << numbers,
};
typedef uint16_t mask;
struct list {
char *s;
struct list *next;
};
typedef struct list list_cell;
typedef struct list *list;
static inline int is_digit(char c) { return c >= zero && c <= nine; }
static inline mask char2mask(char c) { return 1 << (c - zero); }
static inline mask add_char2mask(mask m, char c) {
return m | (is_digit(c) ? char2mask(c) : 0);
}
static inline int is_set(mask m, mask n) { return (m & n) != 0; }
static inline int is_set_char(mask m, char c) { return is_set(m, char2mask(c)); }
static inline int is_submask(mask sub, mask m) { return (sub & m) == sub; }
static inline char *sprint_mask(char buf[11], mask m) {
char *s = buf;
char i;
for(i = zero; i <= nine; i++)
if(is_set_char(m, i)) *s++ = i;
*s = 0;
return buf;
}
static inline mask get_mask(char *s) {
mask m=0;
for(; *s; s++)
m = add_char2mask(m, *s);
return m;
}
static inline int is_empty(list l) { return !l; }
static inline list insert(list *l, char *s) {
list cell = (list)malloc(sizeof(list_cell));
cell->s = s;
cell->next = *l;
return *l = cell;
}
static void *foreach(void *f(char *, void *), list l, void *init) {
for(; !is_empty(l); l = l->next)
init = f(l->s, init);
return init;
}
struct printer_state {
int first;
FILE *f;
};
static void *prin_list_member(char *s, void *data) {
struct printer_state *st = (struct printer_state *)data;
if(st->first) {
fputs(", ", st->f);
} else
st->first = 1;
fputs(s, st->f);
return data;
}
static void print_list(list l) {
struct printer_state st = {.first = 0, .f = stdout};
foreach(prin_list_member, l, (void *)&st);
putchar('\n');
}
static list *init_lu(void) { return (list *)calloc(sizeof(list), masks); }
static list *insert2lu(list lu[masks], char *s) {
mask i, m = get_mask(s);
if(m) // skip string without any number
for(i = m; i < masks; i++)
if(is_submask(m, i))
insert(lu+i, s);
return lu;
}
int usage(const char *name) {
fprintf(stderr, "Usage: %s filename\n", name);
return EXIT_FAILURE;
}
#define handle_error(msg) \
do { perror(msg); exit(EXIT_FAILURE); } while (0)
static inline void chomp(char *s) { if( (s = strchr(s, '\n')) ) *s = '\0'; }
list *load_file(FILE *f) {
char *line = NULL;
size_t len = 0;
ssize_t read;
list *lu = init_lu();
for(; (read = getline(&line, &len, f)) != -1; line = NULL) {
chomp(line);
insert2lu(lu, line);
}
return lu;
}
void read_reqs(list *lu) {
char *line;
char buf[11];
for(; (line = readline("> ")); free(line))
if(*line) {
add_history(line);
mask m = get_mask(line);
printf("mask: %s\nstrings: ", sprint_mask(buf, m));
print_list(lu[m]);
};
putchar('\n');
}
int main(int argc, const char* argv[] ) {
const char *name = argv[0];
FILE *f;
list *lu;
if(argc != 2) return usage(name);
f = fopen(argv[1], "r");
if(!f) handle_error("open");
lu = load_file(f);
fclose(f);
read_reqs(lu);
return EXIT_SUCCESS;
}
To compile use
gcc -lreadline -o digitfilter digitfilter.c
And test run:
$ cat data.txt
300303334
53210234
123456789
5374576807063874
$ ./digitfilter data.txt
> 034
mask: 034
strings: 300303334
> 0,1,2,3,4,5
mask: 012345
strings: 53210234, 300303334
> 0345678
mask: 0345678
strings: 5374576807063874, 300303334
Put each value into a set-- Eg.: '300303334'={3, 0, 4}.
Since the length of your data items are bound by a constant (50),
you can do these at O(1) time for each item using Java HashSet. The overall complexity of this phase adds up to O(n).
For each filter set, use containsAll() of HashSet to see whether
each of these data items is a subset of your filter. Takes O(n).
Takes O(m*n) in the overall where n is the number of data items and m the number of filters.

Find Second largest number in array at most n+log₂(n)−2 comparisons [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
The community reviewed whether to reopen this question 12 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
You are given as input an unsorted array of n distinct numbers, where n is a power of 2. Give an algorithm that identifies the second-largest number in the array, and that uses at most n+log₂(n)−2 comparisons.
Start with comparing elements of the n element array in odd and even positions and determining largest element of each pair. This step requires n/2 comparisons. Now you've got only n/2 elements. Continue pairwise comparisons to get n/4, n/8, ... elements. Stop when the largest element is found. This step requires a total of n/2 + n/4 + n/8 + ... + 1 = n-1 comparisons.
During previous step, the largest element was immediately compared with log₂(n) other elements. You can determine the largest of these elements in log₂(n)-1 comparisons. That would be the second-largest number in the array.
Example: array of 8 numbers [10,9,5,4,11,100,120,110].
Comparisons on level 1: [10,9] ->10 [5,4]-> 5, [11,100]->100 , [120,110]-->120.
Comparisons on level 2: [10,5] ->10 [100,120]->120.
Comparisons on level 3: [10,120]->120.
Maximum is 120. It was immediately compared with: 10 (on level 3), 100 (on level 2), 110 (on level 1).
Step 2 should find the maximum of 10, 100, and 110. Which is 110. That's the second largest element.
sly s's answer is derived from this paper, but he didn't explain the algorithm, which means someone stumbling across this question has to read the whole paper, and his code isn't very sleek as well. I'll give the crux of the algorithm from the aforementioned paper, complete with complexity analysis, and also provide a Scala implementation, just because that's the language I chose while working on these problems.
Basically, we do two passes:
Find the max, and keep track of which elements the max was compared to.
Find the max among the elements the max was compared to; the result is the second largest element.
In the picture above, 12 is the largest number in the array, and was compared to 3, 1, 11, and 10 in the first pass. In the second pass, we find the largest among {3, 1, 11, 10}, which is 11, which is the second largest number in the original array.
Time Complexity:
All elements must be looked at, therefore, n - 1 comparisons for pass 1.
Since we divide the problem into two halves each time, there are at most log₂n recursive calls, for each of which, the comparisons sequence grows by at most one; the size of the comparisons sequence is thus at most log₂n, therefore, log₂n - 1 comparisons for pass 2.
Total number of comparisons <= (n - 1) + (log₂n - 1) = n + log₂n - 2
def second_largest(nums: Sequence[int]) -> int:
def _max(lo: int, hi: int, seq: Sequence[int]) -> Tuple[int, MutableSequence[int]]:
if lo >= hi:
return seq[lo], []
mid = lo + (hi - lo) // 2
x, a = _max(lo, mid, seq)
y, b = _max(mid + 1, hi, seq)
if x > y:
a.append(y)
return x, a
b.append(x)
return y, b
comparisons = _max(0, len(nums) - 1, nums)[1]
return _max(0, len(comparisons) - 1, comparisons)[0]
The first run for the given example is as follows:
lo=0, hi=1, mid=0, x=10, a=[], y=4, b=[]
lo=0, hi=2, mid=1, x=10, a=[4], y=5, b=[]
lo=3, hi=4, mid=3, x=8, a=[], y=7, b=[]
lo=3, hi=5, mid=4, x=8, a=[7], y=2, b=[]
lo=0, hi=5, mid=2, x=10, a=[4, 5], y=8, b=[7, 2]
lo=6, hi=7, mid=6, x=12, a=[], y=3, b=[]
lo=6, hi=8, mid=7, x=12, a=[3], y=1, b=[]
lo=9, hi=10, mid=9, x=6, a=[], y=9, b=[]
lo=9, hi=11, mid=10, x=9, a=[6], y=11, b=[]
lo=6, hi=11, mid=8, x=12, a=[3, 1], y=11, b=[9]
lo=0, hi=11, mid=5, x=10, a=[4, 5, 8], y=12, b=[3, 1, 11]
Things to note:
There are exactly n - 1=11 comparisons for n=12.
From the last line, y=12 wins over x=10, and the next pass starts with the sequence [3, 1, 11, 10], which has log₂(12)=3.58 ~ 4 elements, and will require 3 comparisons to find the maximum.
I have implemented this algorithm in Java answered by #Evgeny Kluev. The total comparisons are n+log2(n)−2. There is also a good reference:
Alexander Dekhtyar: CSC 349: Design and Analyis of Algorithms. This is similar to the top voted algorithm.
public class op1 {
private static int findSecondRecursive(int n, int[] A){
int[] firstCompared = findMaxTournament(0, n-1, A); //n-1 comparisons;
int[] secondCompared = findMaxTournament(2, firstCompared[0]-1, firstCompared); //log2(n)-1 comparisons.
//Total comparisons: n+log2(n)-2;
return secondCompared[1];
}
private static int[] findMaxTournament(int low, int high, int[] A){
if(low == high){
int[] compared = new int[2];
compared[0] = 2;
compared[1] = A[low];
return compared;
}
int[] compared1 = findMaxTournament(low, (low+high)/2, A);
int[] compared2 = findMaxTournament((low+high)/2+1, high, A);
if(compared1[1] > compared2[1]){
int k = compared1[0] + 1;
int[] newcompared1 = new int[k];
System.arraycopy(compared1, 0, newcompared1, 0, compared1[0]);
newcompared1[0] = k;
newcompared1[k-1] = compared2[1];
return newcompared1;
}
int k = compared2[0] + 1;
int[] newcompared2 = new int[k];
System.arraycopy(compared2, 0, newcompared2, 0, compared2[0]);
newcompared2[0] = k;
newcompared2[k-1] = compared1[1];
return newcompared2;
}
private static void printarray(int[] a){
for(int i:a){
System.out.print(i + " ");
}
System.out.println();
}
public static void main(String[] args) {
//Demo.
System.out.println("Origial array: ");
int[] A = {10,4,5,8,7,2,12,3,1,6,9,11};
printarray(A);
int secondMax = findSecondRecursive(A.length,A);
Arrays.sort(A);
System.out.println("Sorted array(for check use): ");
printarray(A);
System.out.println("Second largest number in A: " + secondMax);
}
}
the problem is:
let's say, in comparison level 1, the algorithm need to be remember all the array element because largest is not yet known, then, second, finally, third. by keep tracking these element via assignment will invoke additional value assignment and later when the largest is known, you need also consider the tracking back. As the result, it will not be significantly faster than simple 2N-2 Comparison algorithm. Moreover, because the code is more complicated, you need also think about potential debugging time.
eg: in PHP, RUNNING time for comparison vs value assignment roughly is :Comparison: (11-19) to value assignment: 16.
I shall give some examples for better understanding. :
example 1 :
>12 56 98 12 76 34 97 23
>>(12 56) (98 12) (76 34) (97 23)
>>> 56 98 76 97
>>>> (56 98) (76 97)
>>>>> 98 97
>>>>>> 98
The largest element is 98
Now compare with lost ones of the largest element 98. 97 will be the second largest.
nlogn implementation
public class Test {
public static void main(String...args){
int arr[] = new int[]{1,2,2,3,3,4,9,5, 100 , 101, 1, 2, 1000, 102, 2,2,2};
System.out.println(getMax(arr, 0, 16));
}
public static Holder getMax(int[] arr, int start, int end){
if (start == end)
return new Holder(arr[start], Integer.MIN_VALUE);
else {
int mid = ( start + end ) / 2;
Holder l = getMax(arr, start, mid);
Holder r = getMax(arr, mid + 1, end);
if (l.compareTo(r) > 0 )
return new Holder(l.high(), r.high() > l.low() ? r.high() : l.low());
else
return new Holder(r.high(), l.high() > r.low() ? l.high(): r.low());
}
}
static class Holder implements Comparable<Holder> {
private int low, high;
public Holder(int r, int l){low = l; high = r;}
public String toString(){
return String.format("Max: %d, SecMax: %d", high, low);
}
public int compareTo(Holder data){
if (high == data.high)
return 0;
if (high > data.high)
return 1;
else
return -1;
}
public int high(){
return high;
}
public int low(){
return low;
}
}
}
Why not to use this hashing algorithm for given array[n]? It runs c*n, where c is constant time for check and hash. And it does n comparisons.
int first = 0;
int second = 0;
for(int i = 0; i < n; i++) {
if(array[i] > first) {
second = first;
first = array[i];
}
}
Or am I just do not understand the question...
In Python2.7: The following code works at O(nlog log n) for the extra sort. Any optimizations?
def secondLargest(testList):
secondList = []
# Iterate through the list
while(len(testList) > 1):
left = testList[0::2]
right = testList[1::2]
if (len(testList) % 2 == 1):
right.append(0)
myzip = zip(left,right)
mymax = [ max(list(val)) for val in myzip ]
myzip.sort()
secondMax = [x for x in myzip[-1] if x != max(mymax)][0]
if (secondMax != 0 ):
secondList.append(secondMax)
testList = mymax
return max(secondList)
public static int FindSecondLargest(int[] input)
{
Dictionary<int, List<int>> dictWinnerLoser = new Dictionary<int, List<int>>();//Keeps track of loosers with winners
List<int> lstWinners = null;
List<int> lstLoosers = null;
int winner = 0;
int looser = 0;
while (input.Count() > 1)//Runs till we get max in the array
{
lstWinners = new List<int>();//Keeps track of winners of each run, as we have to run with winners of each run till we get one winner
for (int i = 0; i < input.Count() - 1; i += 2)
{
if (input[i] > input[i + 1])
{
winner = input[i];
looser = input[i + 1];
}
else
{
winner = input[i + 1];
looser = input[i];
}
lstWinners.Add(winner);
if (!dictWinnerLoser.ContainsKey(winner))
{
lstLoosers = new List<int>();
lstLoosers.Add(looser);
dictWinnerLoser.Add(winner, lstLoosers);
}
else
{
lstLoosers = dictWinnerLoser[winner];
lstLoosers.Add(looser);
dictWinnerLoser[winner] = lstLoosers;
}
}
input = lstWinners.ToArray();//run the loop again with winners
}
List<int> loosersOfWinner = dictWinnerLoser[input[0]];//Gives all the elemetns who lost to max element of array, input array now has only one element which is actually the max of the array
winner = 0;
for (int i = 0; i < loosersOfWinner.Count(); i++)//Now max in the lossers of winner will give second largest
{
if (winner < loosersOfWinner[i])
{
winner = loosersOfWinner[i];
}
}
return winner;
}

Convert Array of Decimal Digits to an Array of Binary Digits

This is probably a quite exotic question.
My Problem is as follows:
The TI 83+ graphing calculator allows you to program on it using either Assembly and a link cable to a computer or its built-in TI-BASIC programming language.
According to what I've found, it supports only 16-Bit Integers and some emulated floats.
I want to work with a bit larger numbers however (around 64 bit), so for that I use an array with the single digits:
{1, 2, 3, 4, 5}
would be the Decimal 12345.
In binary, that's 110000 00111001, or as a binary digit array:
{1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1}
which would be how the calculator displays it.
How would i go about converting this array of decimal digits (which is too large for the calculator to display it as a native type) into an array of decimal digits?
Efficiency is not an issue. This is NOT homework.
This would leave me free to implement Addition for such arrays and such.
thanks!
Thought about it and I think I would do it with the following 'algorithm'
check the last digit (5 in the example case)
if it is odd, store (from the reverse order) a 1 in the binary array
now divide the number by 2 through the following method:
begin with the first digit and clear the 'carry' variable.
divide it by 2 and add the 'carry' variable. If the remainder is 1 (check this before you do the divide with an and&1) then put 5 in the carry
repeat untill all digits have been done
repeat both steps again untill the whole number is reduced to 0's.
the number in your binary array is the binary representation
your example:
1,2,3,4,5
the 5 is odd so we store 1 in the binary array: 1
we divide the array by 2 using the algorithm:
0,2,3,4,5 => 0,1+5,3,4,5 => 0,6,1,4,5 => 0,6,1,2+5,5 => 0,6,1,7,2
and repeat:
0,6,1,7,2 last digit is even so we store a 0: 0,1 (notice we fill the binary string from right to left)
etc
you end up with a binary
EDIT:
Just to clarify above: All I'm doing is the age old algorithm:
int value=12345;
while(value>0)
{
binaryArray.push(value&1);
value>>=1; //divide by 2
}
except in your example we don't have an int but an array which represents a (10 base) int ;^)
On way would be to convert each digit in the decimal representation to it's binary representation and then add the binary representations of all the digits:
5 = 101
40 = 101000
300 = 100101100
2000 = 11111010000
10000 = 10011100010000
101
101000
100101100
11111010000
+ 10011100010000
----------------
11000000111001
Proof of concept in C#:
Methods for converting to an array of binary digits, adding arrays and multiplying an array by ten:
private static byte[] GetBinary(int value) {
int bit = 1, len = 1;
while (bit * 2 < value) {
bit <<= 1;
len++;
}
byte[] result = new byte[len];
for (int i = 0; value > 0;i++ ) {
if (value >= bit) {
value -= bit;
result[i] = 1;
}
bit >>= 1;
}
return result;
}
private static byte[] Add(byte[] a, byte[] b) {
byte[] result = new byte[Math.Max(a.Length, b.Length) + 1];
int carry = 0;
for (int i = 1; i <= result.Length; i++) {
if (i <= a.Length) carry += a[a.Length - i];
if (i <= b.Length) carry += b[b.Length - i];
result[result.Length - i] = (byte)(carry & 1);
carry >>= 1;
}
if (result[0] == 0) {
byte[] shorter = new byte[result.Length - 1];
Array.Copy(result, 1, shorter, 0, shorter.Length);
result = shorter;
}
return result;
}
private static byte[] Mul2(byte[] a, int exp) {
byte[] result = new byte[a.Length + exp];
Array.Copy(a, result, a.Length);
return result;
}
private static byte[] Mul10(byte[] a, int exp) {
for (int i = 0; i < exp; i++) {
a = Add(Mul2(a, 3), Mul2(a, 1));
}
return a;
}
Converting an array:
byte[] digits = { 1, 2, 3, 4, 5 };
byte[][] bin = new byte[digits.Length][];
int exp = 0;
for (int i = digits.Length - 1; i >= 0; i--) {
bin[i] = Mul10(GetBinary(digits[i]), exp);
exp++;
}
byte[] result = null;
foreach (byte[] digit in bin) {
result = result == null ? digit: Add(result, digit);
}
// output array
Console.WriteLine(
result.Aggregate(
new StringBuilder(),
(s, n) => s.Append(s.Length == 0 ? "" : ",").Append(n)
).ToString()
);
Output:
1,1,0,0,0,0,0,0,1,1,1,0,0,1
Edit:
Added methods for multiplying an array by tens. Intead of multiplying the digit before converting it to a binary array, it has to be done to the array.
The main issue here is that you're going between bases which aren't multiples of one another, and thus there isn't a direct isolated mapping between input digits and output digits. You're probably going to have to start with your least significant digit, output as many least significant digits of the output as you can before you need to consult the next digit, and so on. That way you only need to have at most 2 of your input digits being examined at any given point in time.
You might find it advantageous in terms of processing order to store your numbers in reversed form (such that the least significant digits come first in the array).

Resources