Adding sequential integers make strings unique? - algorithm

dynamic list means elements of the list can be added deleted or changed.
The question is similar to how duplicate file names are handled in windows.
For example:
file
file (1)
file (2)
file (3)
If file (2) is deleted and then another file with name file is added, file (2) will be the generated file name. (don't think that happens in windows though)
Is there an elegant way to do this without searching through the whole list on every insert?

You can use a queue to store the integers which were freed and a counter to keep in mind which was the last one:
Queue s;
int lastUsedInt; //Initialize to 0
void delete( int fileNumber ){
s.push(fileNumber);
}
int getIntForNewFile(){
if( s.empty() ){ //If our queue is empty then there are no unused spaces
return lastUsedInt++; //Return lastUsedInt and increment it by 1
}
else{
int i = s.top(); //Get current top of the queue
s.pop(); //Delete current top of the queue
return i; //Return the retrieved number
}
}
Some Cpp pseudo code here :)
This will "fill" the empty spots in the order they were deleted.
So if you have files 1 through 10, then delete 5 and 2: 5 will be filled first, then 2.
If you want them to be filled in order, you can use a sorted container.
In C++ that would be a priority_queue.

Use a min heap to store the items that get deleted. If the heap is empty, then there are no free items. Otherwise, take the first one.
A simple min heap implementation is available at http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=789
To use it:
BinaryHeap<int> MyHeap = new BinaryHeap<int>();
When you remove an item from your list, add the number to the heap:
MyHeap.Insert(number);
To get the next number:
if (MyHeap.Count > 0)
nextNumber = MyHeap.RemoveRoot();
else
nextNumber = List.Count;
This guarantees that you'll always get the smallest available number.

I would use a self-balancing binary tree as a set representation (e.g., the C++ standard std::set<int> container).
The set would consist of all "unallocated" numbers up to and including one greater than the largest allocated number.
Initialize the set to contain 0 (or 1, whatever your preference is).
To allocate a new number, grab the smallest element in the set and remove it. Call that number n. If the set is now empty, put n+1 into the set.
To de-allocate a number, add it to the set. Then perform the following cleanup algorithm:
Step 1: If the set has one element, stop.
Step 2: If the largest element in the set minus the second-largest element is greater than 1, stop.
Step 3: Remove the largest element.
Step 4: Goto step 2.
With this data structure and algorithm, any sequence of k allocate/deallocate operations requires O(k log k) time. (Even though the "cleanup" operation is O(k log k) time all by itself, you can only remove an element after you have inserted it, so the total time does not exceed O(k log k).)
Put another way, each allocate/deallocate takes logarithmic amortized time.

Related

How to instruct longest palindrome from a list of numbers?

I am trying to solve a question which says that we need to write a function in which given a list of numbers, we need to find the longest palindrome that we can from given only the numbers in the list.
For eg:
If the given list is : [3,47,6,6,5,6,15,22,1,6,15]
The longest palindrome that we can return is one of length 9, such as [6,15,6,3,47,3,6,15,6].
Additionally, we have the following constraints:
One can only use an array queue, array stack, and a chaining hashmap, and the list we are supposed to return, and the function must run in linear time. And we can use only constant additional space.
My approach was the following:
Since a palindrome can be formed if have an even number of certain characters, we can iterate over all the elements in the list, and store in a chaining hash map, the number of times each number appears in the list. This should take O(N) time since each lookup in the chaining hash map takes constant time, and iterating over the list takes linear time.
Then we can iterate over all the numbers in the chaining hash map, to see which numbers appear an even number of times, and accordingly, just make a palindrome. In the worst case, this will take a O(n) linear time.
Now there are two things I am wondering:
How should I make the actual palindrome? Like how do I use the data structures that I am being allowed to use in order to make a palindrome? I am thinking that since the queue is a LIFO data structure, for each number that occurs an even number of times, we add it once to the queue and once to the stack, and so on and so forth. And finally, we can just dequeue everything from the queue, and pop once from the stack, and then add it to the list!
It seems that with my approach, it is taking me two linear runs to solve the question. I am wondering if there is a faster way to do this.
Any and all help will be appreciated. Thanks!
It is not possible to get a better algorithm than one that is O(n), as every number in the input has to be inspected, as it might provide a possibility for a longer palindrome. If indeed the output must be a longest palindrome itself (and not only its length), then producing that output itself represents O(n).
You have also omitted one additional thing you have to do in your algorithm: there can be one value in the final palindrome that occurs only once (in the centre). So whenever you encounter a value that occurs an odd number of times, you may reserve one occurrence of that value for putting in the middle of an odd-length palindrome. The even remainder of the occurrences can be used as usual.
As to your questions:
How should I make the actual palindrome?
There are many ways to do it. But don't forget that if you have an even number of occurrences, you should use all those occurrences, not just two. So add half of them to the queue and half of them to the stack. When the frequency is odd, then still do the same (rounded down), and log the number also as a potential centre value.
When you have done this for all values, then dump the queue and stack together in the result list as you suggested, but don't forget to put the centre value in between the two, if you identified such a centre value (i.e. when not all occurrences were even).
It seems that with my approach, it is taking me two linear runs to solve the question.
You cannot do this better than with a linear time complexity. You can save a bit of time if you use the stack also for the result, and just dump the queue unto the stack (after potentially pushing the centre value).
I've got a solution when its palindrome only for the number and not the digit.
for the input: [51,15]
we will return [15] || [51] and not [51,15] =>(5,1,1,5);
feature more your example as a problem 3 doesn't appear twice(and appears in the answer)
or maybe I didn't understand the question.
public static int[] polidrom(int [] numbers){
HashMap<Integer/*the numbere*/,Integer/*how many time appeared*/> hash = new HashMap<>();
boolean middleFree= false;
int middleNumber = 0;
int space = 0;
Stack<Integer> stack = new Stack<>();
for (Integer num:numbers) {//count how mant times each digit appears
if(hash.containsKey(num)){hash.replace(num,1+hash.get(num));}
else{hash.put(num,1);}
}
for (Integer num:hash.keySet()) {//how many times i can use pairs
int original =hash.get(num);
int insert = (int)Math.floor(hash.get(num)/2);
if(hash.get(num) % 2 !=0){middleNumber = num;middleFree = true;}//as no copy
hash.replace(num,insert);
if(insert != 0){
for(int i =0; i < original;i++){
stack.push(num);
}
}
}
space = stack.size();
if(space == numbers.length){ space--;};//all the numbers are been used
int [] answer = new int[space+1];//the middle dont have to have an pair
int startPointer =0 , endPointer= space;
while (!stack.isEmpty()){
int num = stack.pop();
answer[startPointer] = num;
answer[endPointer] = num;
startPointer++;
endPointer--;
}
if (middleFree){answer[answer.length/2] = middleNumber;}
return answer;
}
space O(n) => {stack , hashMap , answer Array};
complexity: O(n)
You can skip the part where I used the stack and build the answer array in the same loop.
and I can't think of a way where you will not iterate at least twice;
Hope I've helped

Binary search with gaps

Let's imagine two arrays like this:
[8,2,3,4,9,5,7]
[0,1,1,0,0,1,1]
How can I perform a binary search only in numbers with an 1 below it, ignoring the rest?
I know this can be in O(log n) comparisons, but my current method is slower because it has to go through all the 0s until it hits an 1.
If you hit a number with a 0 below, you need to scan in both directions for a number with a 1 below until you find it -- or the local search space is exhausted. As the scan for a 1 is linear, the ratio of 0s to 1s determines whether the resulting algorithm can still be faster than linear.
This question is very old, but I've just discovered a wonderful little trick to solve this problem in most cases where it comes up. I'm writing this answer so that I can refer to it elsewhere:
Fast Append, Delete, and Binary Search in a Sorted Array
The need to dynamically insert or delete items from a sorted collection, while preserving the ability to search, typically forces us to switch from a simple array representation using binary search to some kind of search tree -- a far more complicated data structure.
If you only need to insert at the end, however (i.e., you always insert a largest or smallest item), or you don't need to insert at all, then it's possible to use a much simpler data structure. It consists of:
A dynamic (resizable) array of items, the item array; and
A dynamic array of integers, the set array. The set array is used as a disjoint set data structure, using the single-array representation described here: How to properly implement disjoint set data structure for finding spanning forests in Python?
The two arrays are always the same size. As long as there have been no deletions, the item array just contains the items in sorted order, and the set array is full of singleton sets corresponding to those items.
If items have been deleted, though, items in the item array are only valid if the there is a root set at the corresponding position in the set array. All sets that have been merged into a single root will be contiguous in the set array.
This data structure supports the required operations as follows:
Append (O(1))
To append a new largest item, just append the item to the item array, and append a new singleton set to the set array.
Delete (amortized effectively O(log N))
To delete a valid item, first call search to find the adjacent larger valid item. If there is no larger valid item, then just truncate both arrays to remove the item and all adjacent deleted items. Since merged sets are contiguous in the set array, this will leave both arrays in a consistent state.
Otherwise, merge the sets for the deleted item and adjacent item in the set array. If the deleted item's set is chosen as the new root, then move the adjacent item into the deleted item's position in the item array. Whichever position isn't chosen will be unused from now on, and can be nulled-out to release a reference if necessary.
If less than half of the item array is valid after a delete, then deleted items should be removed from the item array and the set array should be reset to an all-singleton state.
Search (amortized effectively O(log N))
Binary search proceeds normally, except that we need to find the representative item for every test position:
int find(item_array, set_array, itemToFind) {
int pos = 0;
int limit = item_array.length;
while (pos < limit) {
int testPos = pos + floor((limit-pos)/2);
if (item_array[find_set(set_array, testPos)] < itemToFind) {
pos = testPos + 1; //testPos is too low
} else {
limit = testPos; //testPos is not too low
}
}
if (pos >= item_array.length) {
return -1; //not found
}
pos = find_set(set_array, pos);
return (item_array[pos] == itemToFind) ? pos : -1;
}

How to "sort" elements of 2 possible values in place in linear time? [duplicate]

This question already has answers here:
Stable separation for two classes of elements in an array
(3 answers)
Closed 9 years ago.
Suppose I have a function f and array of elements.
The function returns A or B for any element; you could visualize the elements this way ABBAABABAA.
I need to sort the elements according to the function, so the result is: AAAAAABBBB
The number of A values doesn't have to equal the number of B values. The total number of elements can be arbitrary (not fixed). Note that you don't sort chars, you sort objects that have a single char representation.
Few more things:
the sort should take linear time - O(n),
it should be performed in place,
it should be a stable sort.
Any ideas?
Note: if the above is not possible, do you have ideas for algorithms sacrificing one of the above requirements?
If it has to be linear and in-place, you could do a semi-stable version. By semi-stable I mean that A or B could be stable, but not both. Similar to Dukeling's answer, but you move both iterators from the same side:
a = first A
b = first B
loop while next A exists
if b < a
swap a,b elements
b = next B
a = next A
else
a = next A
With the sample string ABBAABABAA, you get:
ABBAABABAA
AABBABABAA
AAABBBABAA
AAAABBBBAA
AAAAABBBBA
AAAAAABBBB
on each turn, if you make a swap you move both, if not you just move a. This will keep A stable, but B will lose its ordering. To keep B stable instead, start from the end and work your way left.
It may be possible to do it with full stability, but I don't see how.
A stable sort might not be possible with the other given constraints, so here's an unstable sort that's similar to the partition step of quick-sort.
Have 2 iterators, one starting on the left, one starting on the right.
While there's a B at the right iterator, decrement the iterator.
While there's an A at the left iterator, increment the iterator.
If the iterators haven't crossed each other, swap their elements and repeat from 2.
Lets say,
Object_Array[1...N]
Type_A objs are A1,A2,...Ai
Type_B objs are B1,B2,...Bj
i+j = N
FOR i=1 :N
if Object_Array[i] is of Type_A
obj_A_count=obj_A_count+1
else
obj_B_count=obj_B_count+1
LOOP
Fill the resultant array with obj_A and obj_B with their respective counts depending on obj_A > obj_B
The following should work in linear time for a doubly-linked list. Because up to N insertion/deletions are involved that may cause quadratic time for arrays though.
Find the location where the first B should be after "sorting". This can be done in linear time by counting As.
Start with 3 iterators: iterA starts from the beginning of the container, and iterB starts from the above location where As and Bs should meet, and iterMiddle starts one element prior to iterB.
With iterA skip over As, find the 1st B, and move the object from iterA to iterB->previous position. Now iterA points to the next element after where the moved element used to be, and the moved element is now just before iterB.
Continue with step 3 until you reach iterMiddle. After that all elements between first() and iterB-1 are As.
Now set iterA to iterB-1.
Skip over Bs with iterB. When A is found move it to just after iterA and increment iterA.
Continue step 6 until iterB reaches end().
This would work as a stable sort for any container. The algorithm includes O(N) insertion/deletion, which is linear time for containers with O(1) insertions/deletions, but, alas, O(N^2) for arrays. Applicability in you case depends on whether the container is an array rather than a list.
If your data structure is a linked list instead of an array, you should be able to meet all three of your constraints. You just skim through the list and accumulating and moving the "B"s will be trivial pointer changes. Pseudo code below:
sort(list) {
node = list.head, blast = null, bhead = null
while(node != null) {
nextnode = node.next
if(node.val == "a") {
if(blast != null){
//move the 'a' to the front of the 'B' list
bhead.prev.next = node, node.prev = bhead.prev
blast.next = node.next, node.next.prev = blast
node.next = bhead, bhead.prev = node
}
}
else if(node.val == "b") {
if(blast == null)
bhead = blast = node
else //accumulate the "b"s..
blast = node
}
3
node = nextnode
}
}
So, you can do this in an array, but the memcopies, that emulate the list swap, will make it quiet slow for large arrays.
Firstly, assuming the array of A's and B's is either generated or read-in, I wonder why not avoid this question entirely by simply applying f as the list is being accumulated into memory into two lists that would subsequently be merged.
Otherwise, we can posit an alternative solution in O(n) time and O(1) space that may be sufficient depending on Sir Bohumil's ultimate needs:
Traverse the list and sort each segment of 1,000,000 elements in-place using the permutation cycles of the segment (once this step is done, the list could technically be sorted in-place by recursively swapping the inner-blocks, e.g., ABB AAB -> AAABBB, but that may be too time-consuming without extra space). Traverse the list again and use the same constant space to store, in two interval trees, the pointers to each block of A's and B's. For example, segments of 4,
ABBAABABAA => AABB AABB AA + pointers to blocks of A's and B's
Sequential access to A's or B's would be immediately available, and random access would come from using the interval tree to locate a specific A or B. One option could be to have the intervals number the A's and B's; e.g., to find the 4th A, look for the interval containing 4.
For sorting, an array of 1,000,000 four-byte elements (3.8MB) would suffice to store the indexes, using one bit in each element for recording visited indexes during the swaps; and two temporary variables the size of the largest A or B. For a list of one billion elements, the maximum combined interval trees would number 4000 intervals. Using 128 bits per interval, we can easily store numbered intervals for the A's and B's, and we can use the unused bits as pointers to the block index (10 bits) and offset in the case of B (20 bits). 4000*16 bytes = 62.5KB. We can store an additional array with only the B blocks' offsets in 4KB. Total space under 5MB for a list of one billion elements. (Space is in fact dependent on n but because it is extremely small in relation to n, for all practical purposes, we may consider it O(1).)
Time for sorting the million-element segments would be - one pass to count and index (here we can also accumulate the intervals and B offsets) and one pass to sort. Constructing the interval tree is O(nlogn) but n here is only 4000 (0.00005 of the one-billion list count). Total time O(2n) = O(n)
This should be possible with a bit of dynamic programming.
It works a bit like counting sort, but with a key difference. Make arrays of size n for both a and b count_a[n] and count_b[n]. Fill these arrays with how many As or Bs there has been before index i.
After just one loop, we can use these arrays to look up the correct index for any element in O(1). Like this:
int final_index(char id, int pos){
if(id == 'A')
return count_a[pos];
else
return count_a[n-1] + count_b[pos];
}
Finally, to meet the total O(n) requirement, the swapping needs to be done in a smart order. One simple option is to have recursive swapping procedure that doesn't actually perform any swapping until both elements would be placed in correct final positions. EDIT: This is actually not true. Even naive swapping will have O(n) swaps. But doing this recursive strategy will give you absolute minimum required swaps.
Note that in general case this would be very bad sorting algorithm since it has memory requirement of O(n * element value range).

Algorithm for detecting duplicates in a dataset which is too large to be completely loaded into memory

Is there an optimal solution to this problem?
Describe an algorithm for finding duplicates in a file of one million phone numbers. The algorithm, when running, would only have two megabytes of memory available to it, which means you cannot load all the phone numbers into memory at once.
My 'naive' solution would be an O(n^2) solution which iterates over the values and just loads the file in chunks instead of all at once.
For i = 0 to 999,999
string currentVal = get the item at index i
for j = i+1 to 999,999
if (j - i mod fileChunkSize == 0)
load file chunk into array
if data[j] == currentVal
add currentVal to duplicateList and exit for
There must be another scenario were you can load the whole dataset in a really unique way and verify if a number is duplicated. Anyone have one?
Divide the file into M chunks, each of which is large enough to be sorted in memory. Sort them in memory.
For each set of two chunks, we will then carry out the last step of mergesort on two chunks to make one larger chunk (c_1 + c_2) (c_3 + c_4) .. (c_m-1 + c_m)
Point at the first element on c_1 and c_2 on disk, and make a new file (we'll call it c_1+2).
if c_1's pointed-to element is a smaller number than c_2's pointed-to element, copy it into c_1+2 and point to the next element of c_1.
Otherwise, copy c_2's pointed element into and point to the next element of c_2.
Repeat the previous step until both arrays are empty. You only need to use the space in memory needed to hold the two pointed-to numbers. During this process, if you encounter c_1 and c_2's pointed-to elements being equal, you have found a duplicate - you can copy it in twice and increment both pointers.
The resulting m/2 arrays can be recursively merged in the same manner- it will take log(m) of these merge steps to generate the correct array. Each number will be compared against each other number in a way that will find the duplicates.
Alternately, a quick and dirty solution as alluded to by #Evgeny Kluev is to make a bloom filter which is as large as you can reasonably fit in memory. You can then make a list of the index of each element which fails the bloom filter and loop through the file a second time in order to test these members for duplication.
I think Airza's solution is heading towards a good direction, but since sorting is not what you want, and it is more expensive you can do the following by combining with angelatlarge's approach:
Take a chunk C that fits in the memory of size M/2 .
Get the chunk Ci
Iterate through i and hash each element into a hash-table. If the element already exists then you know it is a duplicate, and you can mark it as a duplicate. (add its index into an array or something).
Get the next chunk Ci+1 and check if any of the key already exists in the hash table. If an element exists mark it for deletion.
Repeat with all chunks until you know they will not contain any duplicates from chunk Ci
Repeat steps 1,2 with chunk Ci+1
Deleted all elements marked for deletion (could be done during, whatever is more appropriate, it might be more expensive to delete one at the time if you have to shift everything else around).
This runs in O((N/M)*|C|) , where |C| is the chunk size. Notice that if M > 2N, then we only have one chunk, and this runs in O(N), which is optimal for deleting duplicates.
We simply hash them and make sure that all collisions are deleted.
Edit: Per requested, I'm providing details:
* N is the number phone numbers.
The size of the chunk will depend on the memory, it should be of size M/2.
This is the size of memory that will load a chunk of the file, since the whole file is too big to be loaded to memory.
This leaves another M/2 bytes to keep the hash table2, and/or a duplicate list1.
Hence, there should be N/(M/2) chunks, each of size |C| = M/2
The run time will be the number of chunks(N/(M/2)), times the size of each chunk |C| (or M/2). Overall, this should be linear (plus or minus the overhead of changing from one chunk to the other, which is why the best way to describe it is O( (N/M) * |C| )
a. Loading a chunk Ci. O(|C|)
b. Iterate through each element, test and set if not there O(1) will be hashed in which insertion and lookup should take.
c. If the element is already there, you can delete it.1
d. Get the next chunk, rinse and repeat (2N/M chunks, so O(N/M))
1 Removing an element might cost O(N), unless we keep a list and remove them all in one go, by avoiding to shift all the remaining elements whenever an element is removed.
2 If the phone numbers can all be represented as an integer < 232 - 1, we can avoid having a full hash-table and just use a flag map, saving piles of memory (we'll only need N-bits of memory)
Here's a somewhat detailed pseudo-code:
void DeleteDuplicate(File file, int numberOfPhones, int maxMemory)
{
//Assume each 1'000'000 number of phones that fit in 32-bits.
//Assume 2MB of memory
//Assume that arrays of bool are coalesced into 8 bools per byte instead of 1 bool per byte
int chunkSize = maxMemory / 2; // 2MB / 2 / 4-byes per int = 1MB or 256K integers
//numberOfPhones-bits. C++ vector<bool> for example would be space efficient
// Coalesced-size ~= 122KB | Non-Coalesced-size (worst-case) ~= 977KB
bool[] exists = new bool[numberOfPhones];
byte[] numberData = new byte[chunkSize];
int fileIndex = 0;
int bytesLoaded;
do //O(chunkNumber)
{
bytesLoaded = file.GetNextByes(chunkSize, /*out*/ numberData);
List<int> toRemove = new List<int>(); //we still got some 30KB-odd to spare, enough for some 6 thousand-odd duplicates that could be found
for (int ii = 0; ii < bytesLoaded; ii += 4)//O(chunkSize)
{
int phone = BytesToInt(numberData, ii);
if (exists[phone])
toRemove.push(ii);
else
exists[phone] = true;
}
for (int ii = toRemove.Length - 1; ii >= 0; --ii)
numberData.removeAt(toRemove[ii], 4);
File.Write(fileIndex, numberData);
fileIndex += bytesLoaded;
} while (bytesLoaded > 0); // while still stuff to load
}
If you can store temporary files you can load the file in chunks, sort each chunk, write it to a file, and then iterate through the chunks and look for duplicates. You can easily tell if a number is duplicated by comparing it to the next number in the file and the next number in each of the chunks. Then move to the next lowest number of all of the chunks and repeat until you run out of numbers.
Your runtime is O(n log n) due to the sorting.
I like the #airza solution, but perhaps there is another algorithm to consider: maybe one million phone numbers cannot be loaded into memory at once because they are expressed inefficiently, i.e. using more bytes per phone number than necessary. In that case, you might be able to have an efficient solution by hashing the phone numbers and storing the hashes in a (hash) table. Hash tables support dictionary operations (such as in) that let you find dupes easily.
To be more concrete about it, if each phone number is 13 bytes (such as a string in the format (NNN)NNN-NNNN), the string represents one of a billion numbers. As an integer, this can be stored in 4 bytes (instead of 13 in the string format). We then might be able to store this 4 byte "hash" in a hash table, because now our 1 billion hashed numbers take up as much space as 308 million numbers, not one billion. Ruling out impossible numbers (everything in area codes 000, 555, etc) might allow us reduce the hash size further.

Permutations with extra restrictions

I have a set of items, for example: {1,1,1,2,2,3,3,3}, and a restricting set of sets, for example {{3},{1,2},{1,2,3},{1,2,3},{1,2,3},{1,2,3},{2,3},{2,3}. I am looking for permutations of items, but the first element must be 3, and the second must be 1 or 2, etc.
One such permutation that fits is:
{3,1,1,1,2,2,3}
Is there an algorithm to count all permutations for this problem in general? Is there a name for this type of problem?
For illustration, I know how to solve this problem for certain types of "restricting sets".
Set of items: {1,1,2,2,3}, Restrictions {{1,2},{1,2,3},{1,2,3},{1,2},{1,2}}. This is equal to 2!/(2-1)!/1! * 4!/2!/2!. Effectively permuting the 3 first, since it is the most restrictive and then permuting the remaining items where there is room.
Also... polynomial time. Is that possible?
UPDATE: This is discussed further at below links. The problem above is called "counting perfect matchings" and each permutation restriction above is represented by a {0,1} on a matrix of slots to occupants.
https://math.stackexchange.com/questions/519056/does-a-matrix-represent-a-bijection
https://math.stackexchange.com/questions/509563/counting-permutations-with-additional-restrictions
https://math.stackexchange.com/questions/800977/parking-cars-and-vans-into-car-van-and-car-van-parking-spots
All of the other solutions here are exponential time--even for cases that they don't need to be. This problem exhibits similar substructure, and so it should be solved with dynamic programming.
What you want to do is write a class that memoizes solutions to subproblems:
class Counter {
struct Problem {
unordered_multiset<int> s;
vector<unordered_set<int>> v;
};
int Count(Problem const& p) {
if (m.v.size() == 0)
return 1;
if (m.find(p) != m.end())
return m[p];
// otherwise, attack the problem choosing either choosing an index 'i' (notes below)
// or a number 'n'. This code only illustrates choosing an index 'i'.
Problem smaller_p = p;
smaller_p.v.erase(v.begin() + i);
int retval = 0;
for (auto it = p.s.begin(); it != p.s.end(); ++it) {
if (smaller_p.s.find(*it) == smaller_p.s.end())
continue;
smaller_p.s.erase(*it);
retval += Count(smaller_p);
smaller_p.s.insert(*it);
}
m[p] = retval;
return retval;
}
unordered_map<Problem, int> m;
};
The code illustrates choosing an index i, which should be chosen at a place where there are v[i].size() is small. The other option is to choose a number n, which should be one for which there are few locations v that it can be placed in. I'd say the minimum of the two deciding factors should win.
Also, you'll have to define a hash function for Problem -- that shouldn't be too hard using boost's hash stuff.
This solution can be improved by replacing the vector with a set<>, and defining a < operator for unordered_set. This will collapse many more identical subproblems into a single map element, and further reduce mitigate exponential blow-up.
This solution can be further improved by making Problem instances that are the same except that the numbers are rearranged hash to the same value and compare to be the same.
You might consider a recursive solution that uses a pool of digits (in the example you provide, it would be initialized to {1,1,1,2,2,3,3,3}), and decides, at the index given as a parameter, which digit to place at this index (using, of course, the restrictions that you supply).
If you like, I can supply pseudo-code.
You could build a tree.
Level 0: Create a root node.
Level 1: Append each item from the first "restricting set" as children of the root.
Level 2: Append each item from the second restricting set as children of each of the Level 1 nodes.
Level 3: Append each item from the third restricting set as children of each of the Level 2 nodes.
...
The permutation count is then the number of leaf nodes of the final tree.
Edit
It's unclear what is meant by the "set of items" {1,1,1,2,2,3,3,3}. If that is meant to constrain how many times each value can be used ("1" can be used 3 times, "2" twice, etc.) then we need one more step:
Before appending a node to the tree, remove the values used on the current path from the set of items. If the value you want to append is still available (e.g. you want to append a "1", and "1" has only been used twice so far) then append it to the tree.
To save space, you could build a directed graph instead of a tree.
Create a root node.
Create a node for each item in the
first set, and link from the root to
the new nodes.
Create a node for each item in the
second set, and link from each first
set item to each second set item.
...
The number of permutations is then the number of paths from the root node to the nodes of the final set.

Resources