Efficiently generating all possible permutations of a linked list? - algorithm

There are many algorithms for generating all possible permutations of a given set of values. Typically, those values are represented as an array, which has O(1) random access.
Suppose, however, that the elements to permute are represented as a doubly-linked list. In this case, you cannot randomly access elements in the list in O(1) time, so many permutation algorithms will experience an unnecessary slowdown.
Is there an algorithm for generating all possible permutations of a linked list with as little time and space overhead as possible?

Try to think of how you generate all permutations on a piece of paper.
You start from the rightmost number and go one position to the left until you see a number that is smaller than its neighbour. Than you place there the number that is next in value, and order all the remaining numbers in increasing order after it. Do this until there is nothing more to do. Put a little thought in it and you can order the numbers in linear time with respect to their number.
This in fact is the typical algorithm used for next permutation as far as I know. I see no reason why this would be faster on array than on list.

You might want to look into the Steinhaus–Johnson–Trotter algorithm. It generates all permutations of a sequence only by swapping adjacent elements; something which you can do in O(1) in a doubly linked list.

You should read the linked-list's data into an array, which takes O(n) and then use Heap's permutation ( http://www.geekviewpoint.com/java/numbers/permutation) to find all the permutations.

You can use the Factoradic Permutation Algorithm, and rearrange the node pointers accordingly to generate the resulting permutation in place without recursion.
A pseudo description:
element_to_permute = list
temp_list = new empty list
for i = 1 in n!
indexes[] = factoradic(i)
for j in indexes[]
rearrange pointers of node `indexes[j]` of `element_to_permute` in `temp_list`

Related

Selection sort order

My book says:
Suppose you have a group of N numbers and would like to determine the kth largest. This is known as the selection problem. Most students who have had a programming course or two would have no difficulty writing a program to solve this problem. There are quite a few “obvious” solutions. One way to solve this problem would be to read the N numbers into an array, sort the array in decreasing order.
It says that it would make sense to sort the array in decreasing order. How does that make sense? If I have an array of {1,9,3,7,4,6} and I want the greatest element, I would sort it in an increasing order so {1,3,4,6,7,9} and then return the last element. Why would the book say in decreasing order?
Because you may not want the largest element, the book says
would like to determine the kth largest
If you sort it in ascending order, how do you know what the, say, 3rd largest number is without first finding out how big the array is?
This would be easier if the array was descending, as the 3rd largest will simply be the 3rd element.
The order itself is not that important, but if you want to k-th largest element, then if you sort in descending order, it is located at the k-th element (or k-1 if we start with index 0), whereas if we sort in ascending order, it is located at index n-k+1 (or n-k if the index starts at 0).
For lazy sorting algorithms (like the ones in Haskell and C# Linq's .OrderBy), this can in fact have implications with respect to time complexity. If we implement a lazy selection sort algorithm (so a generator), then this will run in O(k×n) instead of O(n2). If we use for example a lazy variant of QuickSort, it will take O(n + k log n) to obtain the first k elements.
In a language like Haskell, where laziness is really a key feature, one typically does not only aim to minimize the time complexity of the algorithm producing the entire result, but also producing subsets of the result.

Determine if a list of compare-and-swap instructions will sort a list of length N

Say I have a list of pairs of indices in an array of length N. I want to determine if an arbitrarily sorted list is sorted after doing
for pair in pairs:
if list_to_sort[pair.first] > list_to_sort[pair.second]:
swap(
element_a_index=pair.first,
element_b_index=pair.second,
list=list_to_sort
)
Obviously, I could test all permutations of the N-element list. Is there a faster way? And if there is, what is it? What is it called? Is it provably the fastest solution?
What you're describing is an algorithm called a sorting network. Unfortunately, it's known that the problem of determining whether a series of swaps yields a valid sorting network is co-NP-complete, meaning that unless P = NP there is no polynomial-time algorithm for checking whether your particular choice of pairs will work correctly.
Interestingly, though, you don't need to try all possible permutations of the input. There's a result called the zero-one principle that states that as long as your sorting network will correctly sort a list of 0s and 1s, it will sort any input sequence correctly. Consequently, rather than trying all n! possible permutations of the inputs, you can check all 2n possible sequences of 0s and 1s. This is still pretty infeasible if n is very large, but might be useful if your choice of n is small.
As for the best possible way to build a sorting network - this is an open problem! There are many good sorting networks that run quickly on most inputs, and there are also a few sorting networks that are known to be optimal. Searching for "optimal sorting network" might turn up some useful links.
Hope this helps!

Number of different elements in an array

Is it possible to compute the number of different elements in an array in linear time and constant space? Let us say it's an array of long integers, and you can not allocate an array of length sizeof(long).
P.S. Not homework, just curious. I've got a book that sort of implies that it is possible.
This is the Element uniqueness problem, for which the lower bound is Ω( n log n ), for comparison-based models. The obvious hashing or bucket sorting solution all requires linear space too, so I'm not sure this is possible.
You can't use constant space. You can use O(number of different elements) space; that's what a HashSet does.
You can use any sorting algorithm and count the number of different adjacent elements in the array.
I do not think this can be done in linear time. One algorithm to solve in O(n log n) requires first sorting the array (then the comparisons become trivial).
If you are guaranteed that the numbers in the array are bounded above and below, by say a and b, then you could allocate an array of size b - a, and use it to keep track of which numbers have been seen.
i.e., you would move through your input array take each number, and mark a true in your target array at that spot. You would increment a counter of distinct numbers only when you encounter a number whose position in your storage array is false.
Assuming we can partially destroy the input, here's an algorithm for n words of O(log n) bits.
Find the element of order sqrt(n) via linear-time selection. Partition the array using this element as a pivot (O(n)). Using brute force, count the number of different elements in the partition of length sqrt(n). (This is O(sqrt(n)^2) = O(n).) Now use an in-place radix sort on the rest, where each "digit" is log(sqrt(n)) = log(n)/2 bits and we use the first partition to store the digit counts.
If you consider streaming algorithms only ( http://en.wikipedia.org/wiki/Streaming_algorithm ), then it's impossible to get an exact answer with o(n) bits of storage via a communication complexity lower bound ( http://en.wikipedia.org/wiki/Communication_complexity ), but possible to approximate the answer using randomness and little space (Alon, Matias, and Szegedy).
This can be done with a bucket approach when assuming that there are only a constant number of different values. Make a flag for each value (still constant space). Traverse the list and flag the occured values. If you happen to flag an already flagged value, you've found a duplicate. You have to traverse the buckets for each element in the list. But that's still linear time.

Is it possible to find two numbers whose difference is minimum in O(n) time

Given an unsorted integer array, and without making any assumptions on
the numbers in the array:
Is it possible to find two numbers whose
difference is minimum in O(n) time?
Edit: Difference between two numbers a, b is defined as abs(a-b)
Find smallest and largest element in the list. The difference smallest-largest will be minimum.
If you're looking for nonnegative difference, then this is of course at least as hard as checking if the array has two same elements. This is called element uniqueness problem and without any additional assumptions (like limiting size of integers, allowing other operations than comparison) requires >= n log n time. It is the 1-dimensional case of finding the closest pair of points.
I don't think you can to it in O(n). The best I can come up with off the top of my head is to sort them (which is O(n * log n)) and find the minimum difference of adjacent pairs in the sorted list (which adds another O(n)).
I think it is possible. The secret is that you don't actually have to sort the list, you just need to create a tally of which numbers exist. This may count as "making an assumption" from an algorithmic perspective, but not from a practical perspective. We know the ints are bounded by a min and a max.
So, create an array of 2 bit elements, 1 pair for each int from INT_MIN to INT_MAX inclusive, set all of them to 00.
Iterate through the entire list of numbers. For each number in the list, if the corresponding 2 bits are 00 set them to 01. If they're 01 set them to 10. Otherwise ignore. This is obviously O(n).
Next, if any of the 2 bits is set to 10, that is your answer. The minimum distance is 0 because the list contains a repeated number. If not, scan through the list and find the minimum distance. Many people have already pointed out there are simple O(n) algorithms for this.
So O(n) + O(n) = O(n).
Edit: responding to comments.
Interesting points. I think you could achieve the same results without making any assumptions by finding the min/max of the list first and using a sparse array ranging from min to max to hold the data. Takes care of the INT_MIN/MAX assumption, the space complexity and the O(m) time complexity of scanning the array.
The best I can think of is to counting sort the array (possibly combining equal values) and then do the sorted comparisons -- bin sort is O(n + M) (M being the number of distinct values). This has a heavy memory requirement, however. Some form of bucket or radix sort would be intermediate in time and more efficient in space.
Sort the list with radixsort (which is O(n) for integers), then iterate and keep track of the smallest distance so far.
(I assume your integer is a fixed-bit type. If they can hold arbitrarily large mathematical integers, radixsort will be O(n log n) as well.)
It seems to be possible to sort unbounded set of integers in O(n*sqrt(log(log(n))) time. After sorting it is of course trivial to find the minimal difference in linear time.
But I can't think of any algorithm to make it faster than this.
No, not without making assumptions about the numbers/ordering.
It would be possible given a sorted list though.
I think the answer is no and the proof is similar to the proof that you can not sort faster than n lg n: you have to compare all of the elements, i.e create a comparison tree, which implies omega(n lg n) algorithm.
EDIT. OK, if you really want to argue, then the question does not say whether it should be a Turing machine or not. With quantum computers, you can do it in linear time :)

Fast algorithm for finding next smallest and largest number in a set

I have a set of positive numbers. Given a number not in the set, I want to find the next smallest and next largest numbers that are in the set. The only way I can think to do it now is to find the next smallest by decreasing by 1 until I find a number in the set, and then do the same for finding the next largest.
Motivation: I have a bunch of data in a hashmap, keyed by dates. I don't have a datapoint for every single date. If I have data for, say, 10/01/2000 as 60 and 10/05/2000 as 68, and I ask for 10/02/2000, I want to linearly interpolate. I should get 62.
It depends on if your set is sorted.
If your set is unsorted then finding the closest (higher and lower) is an O(n) operation and a fairly simple algorithm.
If your set is sorted then you can use a modified bisection search to find the answer in O(log n), which is obviously a lot better particularly on larger sets.
If you're doing this repeatedly it might be worth sorting the set, which incurs an O(n log n) cost that might be once off or not depending on how often the set changes. Some kind of tree sort may help improve future sorts as new items are added.
All this boils down to is binary search, provided you can get your data sorted. There are two options.
Sorted Container
If you keep your numbers in a sorted container, this is pretty easy. Instead of using a HashMap, put the data in a TreeMap, then you can efficiently find the next lower or next higher element. Java even has methods to do exactly what you want:
higherKey(K)
lowerKey(K)
This is efficient because TreeMap uses a red-black tree (a kind of balanced binary search tree) internally. higherKey and lowerKey simply start at the root and traverse the tree to find where your element should go.
I'm not sure what language you're using, but in C++ you would usestd::map, and the analogous methods are:
iterator lower_bound(const key_type& k)
iterator upper_bound(const key_type& k)
Array + Sorting
If you don't want to keep your data sorted all the time, you can always dump your data into an array (or any random access container), use sort, and then use the STL's binary search routines on the array:
lower_bound
upper_bound
In Java the analog would be to dump things into an ArrayList, call Java's sort(), then use binarySearch().
All the search routines here are O(logn) time. The cost of keeping your data sorted is O(nlogn) with either a sorted container or with the array. With a sorted container, the cost is amortized over n insertions; with the array you pay it all at once when you call sort().
If you don't want to sort things at all, you can always use a linear search, but you will pay if you use this a lot, as it's an O(n) algorithm.
Put your data items into a tree, like an AVL tree, a red-black tree, or a B+/B- tree. Then you can search the ordered values.
Sort the numbers, then perform binary search on each key to bisect the set. You can then find which numbers are on either side of your missing key.
Convert the set to a list and sort it, then run a binary search for the number not in the set. The result will be the insertion point, i.e. the position at which the number would be present if it were there. If you call that n, then the element at index n of the sorted list is the next smallest number and the element at index n+1 of the sorted list is the next largest number.
You can also do this by keeping the set in sorted order as you construct it, then it becomes an easy matter to search for the insertion point. This approach is used by e.g. the floorEntry() and ceilingEntry() methods of Java's TreeMap.
Keep your set as a sorted list/array and perform bisection-search: e.g., in Python, a sorted list and the bisect module from the standard Python library match your needs to the hilt.
If you get the keys in an array, you can sort the array and find the index of the last element that is less than the desired element. Then you know the index of the key directly before your desired point, and the next element after that is the one directly after.
That should give you enough to interpolate.
(The data structure used need not be an array, anything that will sort is fine. A balanced binary tree, as suggested by others, would be ideal, especially if you plan to reuse the data later).
Finding the n'th element in an unsorted set is O(n). (Select Algorithm) Although here you can boil it down to a simpler, less general algorithm, if you always want the smallest & next smallest elements. But in general, finding the smallest, second smallest, etc. element within an unsorted list is O(n). (You should have been taught this in your algorithms class...)
Sorting a set, and then indexing the element is O(n log n)
Finding an element in a sorted set is O(log n) (binary search)
If you know that there will always be a data point for, say, each week, then keep your HashMap as it is and do what you suggest... That will be a constant time operation since you will be doing 14 hash table lookups (probing 7 days on each side of your search date), each taking O(1) primitive operations.
If you don't know how dense your data is and you can keep it in RAM, then put it into a balanced tree structure as suggested by many others. But this can be costly if you have very many dates and if you have to load the data over the network from a database.

Resources