Complexity analysis for the permutations algorithm - algorithm

I'm trying to understand the time and space complexity of an algorithm for generating an array's permutations. Given a partially built permutation where k out of n elements are already selected, the algorithm selects element k+1 from the remaining n-k elements and calls itself to select the remaining n-k-1 elements:
public static List<List<Integer>> permutations(List<Integer> A) {
List<List<Integer>> result = new ArrayList<>();
permutations(A, 0, result);
return result;
}
public static void permutations(List<Integer> A, int start, List<List<Integer>> result) {
if(A.size()-1==start) {
result.add(new ArrayList<>(A));
return;
}
for (int i=start; i<A.size(); i++) {
Collections.swap(A, start, i);
permutations(A, start+1, result);
Collections.swap(A, start, i);
}
}
My thoughts are that in each call we swap the collection's elements 2n times, where n is the number of elements to permute, and make n recursive calls. So the running time seems to fit the recurrence relation T(n)=nT(n-1)+n=n[(n-1)T(n-2)+(n-1)]+n=...=n+n(n-1)+n(n-1)(n-2)+...+n!=n![1/(n-1)!+1/(n-2)!+...+1]=n!e, hence the time complexity is O(n!) and the space complexity is O(max(n!, n)), where n! is the total number of permutations and n is the height of the recursion tree.
This problem is taken from the Elements of Programming Interviews book, and they're saying that the time complexity is O(n*n!) because "The number of function calls C(n)=1+nC(n-1) ... [which solves to] O(n!) ... [and] ... we do O(n) computation per call outside of the recursive calls".
Which time complexity is correct?

The time complexity of this algorithm, counted by the number of basic operations performed, is Θ(n * n!). Think about the size of the result list when the algorithm terminates-- it contains n! permutations, each of length n, and we cannot create a list with n * n! total elements in less than that amount of time. The space complexity is the same, since the recursion stack only ever has O(n) calls at a time, so the size of the output list dominates the space complexity.
If you count only the number of recursive calls to permutations(), the function is called O(n!) times, although this is usually not what is meant by 'time complexity' without further specification. In other words, you can generate all permutations in O(n!) time, as long as you don't read or write those permutations after they are generated.
The part where your derivation of run-time breaks down is in the definition of T(n). If you define T(n) as 'the run-time of permutations(A, start) when the input, A, has length n', then you can not define it recursively in terms of T(n-1) or any other function of T(), because the length of the input in all recursive calls is n, the length of A.
A more useful way to define T(n) is by specifying it as the run-time of permutations(A', start), when A' is any permutation of a fixed, initial array A, and A.length - start == n. It's easy to write the recurrence relation here:
T(x) = x * T(x-1) + O(x) if x > 1
T(1) = A.length
This takes into account the fact that the last recursive call, T(1), has to perform O(A.length) work to copy that array to the output, and this new recurrence gives the result from the textbook.

Related

Search a Sorted Array for First Occurrence of K

I'm trying to solve question 11.1 in Elements of Programming Interviews (EPI) in Java: Search a Sorted Array for First Occurrence of K.
The problem description from the book:
Write a method that takes a sorted array and a key and returns the index of the first occurrence of that key in the array.
The solution they provide in the book is a modified binary search algorithm that runs in O(logn) time. I wrote my own algorithm also based on a modified binary search algorithm with a slight difference - it uses recursion. The problem is I don't know how to determine the time complexity of my algorithm - my best guess is that it will run in O(logn) time because each time the function is called it reduces the size of the candidate values by half. I've tested my algorithm against the 314 EPI test cases that are provided by the EPI Judge so I know it works, I just don't know the time complexity - here is the code:
public static int searchFirstOfKUtility(List<Integer> A, int k, int Lower, int Upper, Integer Index)
{
while(Lower<=Upper){
int M = Lower + (Upper-Lower)/2;
if(A.get(M)<k)
Lower = M+1;
else if(A.get(M) == k){
Index = M;
if(Lower!=Upper)
Index = searchFirstOfKUtility(A, k, Lower, M-1, Index);
return Index;
}
else
Upper=M-1;
}
return Index;
}
Here is the code that the tests cases call to exercise my function:
public static int searchFirstOfK(List<Integer> A, int k) {
Integer foundKey = -1;
return searchFirstOfKUtility(A, k, 0, A.size()-1, foundKey);
}
So, can anyone tell me what the time complexity of my algorithm would be?
Assuming that passing arguments is O(1) instead of O(n), performance is O(log(n)).
The usual theoretical approach for analyzing recursion is calling the Master Theorem. It is to say that if the performance of a recursive algorithm follows a relation:
T(n) = a T(n/b) + f(n)
then there are 3 cases. In plain English they correspond to:
Performance is dominated by all the calls at the bottom of the recursion, so is proportional to how many of those there are.
Performance is equal between each level of recursion, and so is proportional to how many levels of recursion there are, times the cost of any layer of recursion.
Performance is dominated by the work done in the very first call, and so is proportional to f(n).
You are in case 2. Each recursive call costs the same, and so performance is dominated by the fact that there are O(log(n)) levels of recursion times the cost of each level. Assuming that passing a fixed number of arguments is O(1), that will indeed be O(log(n)).
Note that this assumption is true for Java because you don't make a complete copy of the array before passing it. But it is important to be aware that it is not true in all languages. For example I recently did a bunch of work in PL/pgSQL, and there arrays are passed by value. Meaning that your algorithm would have been O(n log(n)).

Time Complexity of a recursive function where the base case isn't O(1)

Most recursive functions I have seen being asked about (e.g. Fibonacci or Hanoi) have had O(1) returns, but what would the time complexity be if it wasn't O(1) but O(n) instead?
For example, a recursive Fibonacci with O(n) base case:
class fibonacci {
static int fib(int n) {
if (n <= 1)
for (int i=0;i<n;i++) {
// something
}
return n;
return fib(n-1) + fib(n-2);
}
public static void main (String args[])
{
int n = 9;
System.out.println(fib(n));
}
}
The base case for the function that you’ve written here actually still has time complexity O(1). The reason for this is that if the base case triggers here, then n ≤ 1, so the for loop here will run at most once.
Because so many base cases trigger when the input size is small, it’s comparatively rare to get a base case whose runtime is, say, O(n) when the input to the algorithm has size n. This would mean that the base case is independent of the array size, which can happen but is somewhat unusual.
A more common occurrence - albeit one I think is still pretty uncommon - would be for a recursive function to have two different parameters to it (say, n and k), where the recursion reduces n but leaves k unmodified. For example, imagine taking the code you have here and replacing the for loop on n in the base case with a for loop on k in the base case. What happens then?
This turns out to be an interesting question. In the case of this particular problem, it means that the total work done will be given by O(k) times the number of base cases triggered, plus O(1) times the number of non-base-case recursive calls. For the Fibonacci recursion, the number of base cases triggered computing Fn is Fn+1 and there are (Fn+1 - 1) non-base-case calls, so the overall runtime would be Θ(k Fn+1 + Fn+1) = Θ(k φn). For the Towers of Hanoi, you’d similarly see a scaling effect where the overall runtime would be Θ(k 2n). But for other recursive functions the runtime might vary in different ways, depending on how those functions were structured.
Hope this helps!

Find an algorithm for sorting integers with time complexity O(n + k*log(k))

Design an algorithm that sorts n integers where there are duplicates. The total number of different numbers is k. Your algorithm should have time complexity O(n + k*log(k)). The expected time is enough. For which values of k does the algorithm become linear?
I am not able to come up with a sorting algorithm for integers which satisfies the condition that it must be O(n + k*log(k)). I am not a very advanced programmer but I was in the problem before this one supposed to come up with an algorithm for all numbers xi in a list, 0 ≤ xi ≤ m such that the algorithm was O(n+m), where n was the number of elements in the list and m was the value of the biggest integer in the list. I solved that problem easily by using counting sort but I struggle with this problem. The condition that makes it the most difficult for me is the term k*log(k) under the ordo notation if that was n*log(n) instead I would be able to use merge sort, right? But that's not possible now so any ideas would be very helpful.
Thanks in advance!
Here is a possible solution:
Using a hash table, count the number of unique values and the number of duplicates of each value. This should have a complexity of O(n).
Enumerate the hashtable, storing the unique values into a temporary array. Complexity is O(k).
Sort this array with a standard algorithm such as mergesort: complexity is O(k.log(k)).
Create the resulting array by replicating the elements of the sorted array of unique values each the number of times stored in the hash table. complexity is O(n) + O(k).
Combined complexity is O(n + k.log(k)).
For example, if k is a small constant, sorting an array of n values converges toward linear time as n becomes larger and larger.
If during the first phase, where k is computed incrementally, it appears that k is not significantly smaller than n, drop the hash table and just sort the original array with a standard algorithm.
The runtime of O(n + k*log(k) indicates (like addition in runtimes often does) that you have 2 subroutines, one which runes in O(n) and the other that runs in O(k*log(k)).
You can first count the frequency of the elements in O(n) (for example in a Hashmap, look this up if youre not familiar with it, it's very useful).
Then you just sort the unique elements, from which there are k. This sorting runs in O(k*log(k)), use any sorting algorithm you want.
At the end replace the single unique elements by how often they actually appeared, by looking this up in the map you created in step 1.
A possible Java solution an be like this:
public List<Integer> sortArrayWithDuplicates(List<Integer> arr) {
// O(n)
Set<Integer> set = new HashSet<>(arr);
Map<Integer, Integer> freqMap = new HashMap<>();
for(Integer i: arr) {
freqMap.put(i, freqMap.getOrDefault(i, 0) + 1);
}
List<Integer> withoutDups = new ArrayList<>(set);
// Sorting => O(k(log(k)))
// as there are k different elements
Arrays.sort(withoutDups);
List<Integer> result = new ArrayList<>();
for(Integer i : withoutDups) {
int c = freqMap.get(i);
for(int j = 0; j < c; j++) {
result.add(i);
}
}
// return the result
return result;
}
The time complexity of the above code is O(n + k*log(k)) and solution is in the same line as answered above.

Runtime analysis of memoized recursive function

Code below computes if s1 can be reduced to t, if so how many ways.
Let's say length of s1 is n and length of t is m. Worst case runtime of code below is O(n^m) without memoization. Say we can memoize sub-problems of s1, that substring recur. Runtime is O(m*n). Since we need to recur m times for each n. Is this reasoning correct?
static int distinctSeq(String s1, String t) {
if (s1.length() == t.length()) {
if (s1.equals(t))
return 1;
else
return 0;
}
int count = 0;
for (int i = 0; i < s1.length(); i++) {
String ss = s1.substring(0, i)+ s1.substring(i + 1);
count += distinctSeqRec(ss, t);
}
return count;
}
As #meowgoesthedog already mentioned your initial solution has a time complexity of O(n!/m!):
If you are starting with s1 of length n, and n > m then you can go into n different states by excluding one symbol from the original string.
You will continue doing it until your string has the length of m. The number of ways to come to the length of m from n using the given algorithm is n*(n-1)*(n-2)*...*(m+1), which is effectively n!/m!
For each string of length m formed by excluding symbols from initial string of length n you will have to compare string derived from s1 and t, which will require m operations (length of the strings), so the complexity from the previous step should be multiplied by m, but considering that you have factorial in the big-O, another *m won't change the asymptotic complexity.
Now about the memoization. If you add memoization, then algorithm would transition only to states that weren't already visited, which means that the task is to count the number of unique substrings of s1. For simplicity we will consider that all symbols of s1 are different. The number of states with length x is the number of ways of removing different n-x symbols from s1 disregarding the order. Which is actually a binomial coefficient - C(n,x) = n!/((n-x)! * x!).
The algorithm will transition through all lengths between n and m, so overall time complexity would be Sum(k=m...n, C(n,k)) = n!/((n-1)!*1!) + n!/((n-2)!*2! + ... + n!/((n-k)!*k!). Considering that we are counting asymptotic complexity we are interested in the largest member of that sum, which is the one with k as closest as possible to n/2. If m is lesser than n/2, then C(n, n/2) is present in the sum, otherwise C(n,m) is the largest element in it, so the complexity of the algorithm with memoization is O(max(C(n,n/2), C(n,m))).

Median Algorithm in O(log n)

How can we remove the median of a set with time complexity O(log n)? Some idea?
If the set is sorted, finding the median requires O(1) item retrievals. If the items are in arbitrary sequence, it will not be possible to identify the median with certainty without examining the majority of the items. If one has examined most, but not all, of the items, that will allow one to guarantee that the median will be within some range [if the list contains duplicates, the upper and lower bounds may match], but examining the majority of the items in a list implies O(n) item retrievals.
If one has the information in a collection which is not fully ordered, but where certain ordering relationships are known, then the time required may require anywhere between O(1) and O(n) item retrievals, depending upon the nature of the known ordering relation.
For unsorted lists, repeatedly do O(n) partial sort until the element located at the median position is known. This is at least O(n), though.
Is there any information about the elements being sorted?
For a general, unsorted set, it is impossible to reliably find the median in better than O(n) time. You can find the median of a sorted set in O(1), or you can trivially sort the set yourself in O(n log n) time and then find the median in O(1), giving an O(n logn n) algorithm. Or, finally, there are more clever median selection algorithms that can work by partitioning instead of sorting and yield O(n) performance.
But if the set has no special properties and you are not allowed any pre-processing step, you will never get below O(n) by the simple fact that you will need to examine all of the elements at least once to ensure that your median is correct.
Here's a solution in Java, based on TreeSet:
public class SetWithMedian {
private SortedSet<Integer> s = new TreeSet<Integer>();
private Integer m = null;
public boolean contains(int e) {
return s.contains(e);
}
public Integer getMedian() {
return m;
}
public void add(int e) {
s.add(e);
updateMedian();
}
public void remove(int e) {
s.remove(e);
updateMedian();
}
private void updateMedian() {
if (s.size() == 0) {
m = null;
} else if (s.size() == 1) {
m = s.first();
} else {
SortedSet<Integer> h = s.headSet(m);
SortedSet<Integer> t = s.tailSet(m + 1);
int x = 1 - s.size() % 2;
if (h.size() < t.size() + x)
m = t.first();
else if (h.size() > t.size() + x)
m = h.last();
}
}
}
Removing the median (i.e. "s.remove(s.getMedian())") takes O(log n) time.
Edit: To help understand the code, here's the invariant condition of the class attributes:
private boolean isGood() {
if (s.isEmpty()) {
return m == null;
} else {
return s.contains(m) && s.headSet(m).size() + s.size() % 2 == s.tailSet(m).size();
}
}
In human-readable form:
If the set "s" is empty, then "m" must be
null.
If the set "s" is not empty, then it must
contain "m".
Let x be the number of elements
strictly less than "m", and let y be
the number of elements greater than
or equal "m". Then, if the total
number of elements is even, x must be
equal to y; otherwise, x+1 must be
equal to y.
Try a Red-black-tree. It should work quiet good and with a binary search you get ur log(n). It has aswell a remove and insert time of log(n) and rebalancing is done in log(n) aswell.
As mentioned in previous answers, there is no way to find the median without touching every element of the data structure. If the algorithm you look for must be executed sequentially, then the best you can do is O(n). The deterministic selection algorithm (median-of-medians) or BFPRT algorithm will solve the problem with a worst case of O(n). You can find more about that here: http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm
However, the median of medians algorithm can be made to run faster than O(n) making it parallel. Due to it's divide and conquer nature, the algorithm can be "easily" made parallel. For instance, when dividing the input array in elements of 5, you could potentially launch a thread for each sub-array, sort it and find the median within that thread. When this step finished the threads are joined and the algorithm is run again with the newly formed array of medians.
Note that such design would only be beneficial in really large data sets. The additional overhead that spawning threads has and merging them makes it unfeasible for smaller sets. This has a bit of insight: http://www.umiacs.umd.edu/research/EXPAR/papers/3494/node18.html
Note that you can find asymptotically faster algorithms out there, however they are not practical enough for daily use. Your best bet is the already mentioned sequential median-of-medians algorithm.
Master Yoda's randomized algorithm has, of course, a minimum complexity of n like any other, an expected complexity of n (not log n) and a maximum complexity of n squared like Quicksort. It's still very good.
In practice, the "random" pivot choice might sometimes be a fixed location (without involving a RNG) because the initial array elements are known to be random enough (e.g. a random permutation of distinct values, or independent and identically distributed) or deduced from an approximate or exactly known distribution of input values.
I know one randomize algorithm with time complexity of O(n) in expectation.
Here is the algorithm:
Input: array of n numbers A[1...n] [without loss of generality we can assume n is even]
Output: n/2th element in the sorted array.
Algorithm ( A[1..n] , k = n/2):
Pick a pivot - p universally at random from 1...n
Divided array into 2 parts:
L - having element <= A[p]
R - having element > A[p]
if(n/2 == |L|) A[|L| + 1] is the median stop
if( n/2 < |L|) re-curse on (L, k)
else re-curse on (R, k - (|L| + 1)
Complexity:
O( n)
proof is all mathematical. One page long. If you are interested ping me.
To expand on rwong's answer: Here is an example code
// partial_sort example
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
int main () {
int myints[] = {9,8,7,6,5,4,3,2,1};
vector<int> myvector (myints, myints+9);
vector<int>::iterator it;
partial_sort (myvector.begin(), myvector.begin()+5, myvector.end());
// print out content:
cout << "myvector contains:";
for (it=myvector.begin(); it!=myvector.end(); ++it)
cout << " " << *it;
cout << endl;
return 0;
}
Output:
myvector contains: 1 2 3 4 5 9 8 7 6
The element in the middle would be the median.

Resources