what is the complexity of the operation set(list()) - set

Can you help me with finding out the complexity of the set function in python.
A = [1,2,1,2,4]
A = set(A)

It's linear, that is it takes O(n) time, where n is the number of elements in the list A, as you can infer from the official Python documentation.

Related

Search a Sorted Array for First Occurrence of K

I'm trying to solve question 11.1 in Elements of Programming Interviews (EPI) in Java: Search a Sorted Array for First Occurrence of K.
The problem description from the book:
Write a method that takes a sorted array and a key and returns the index of the first occurrence of that key in the array.
The solution they provide in the book is a modified binary search algorithm that runs in O(logn) time. I wrote my own algorithm also based on a modified binary search algorithm with a slight difference - it uses recursion. The problem is I don't know how to determine the time complexity of my algorithm - my best guess is that it will run in O(logn) time because each time the function is called it reduces the size of the candidate values by half. I've tested my algorithm against the 314 EPI test cases that are provided by the EPI Judge so I know it works, I just don't know the time complexity - here is the code:
public static int searchFirstOfKUtility(List<Integer> A, int k, int Lower, int Upper, Integer Index)
{
while(Lower<=Upper){
int M = Lower + (Upper-Lower)/2;
if(A.get(M)<k)
Lower = M+1;
else if(A.get(M) == k){
Index = M;
if(Lower!=Upper)
Index = searchFirstOfKUtility(A, k, Lower, M-1, Index);
return Index;
}
else
Upper=M-1;
}
return Index;
}
Here is the code that the tests cases call to exercise my function:
public static int searchFirstOfK(List<Integer> A, int k) {
Integer foundKey = -1;
return searchFirstOfKUtility(A, k, 0, A.size()-1, foundKey);
}
So, can anyone tell me what the time complexity of my algorithm would be?
Assuming that passing arguments is O(1) instead of O(n), performance is O(log(n)).
The usual theoretical approach for analyzing recursion is calling the Master Theorem. It is to say that if the performance of a recursive algorithm follows a relation:
T(n) = a T(n/b) + f(n)
then there are 3 cases. In plain English they correspond to:
Performance is dominated by all the calls at the bottom of the recursion, so is proportional to how many of those there are.
Performance is equal between each level of recursion, and so is proportional to how many levels of recursion there are, times the cost of any layer of recursion.
Performance is dominated by the work done in the very first call, and so is proportional to f(n).
You are in case 2. Each recursive call costs the same, and so performance is dominated by the fact that there are O(log(n)) levels of recursion times the cost of each level. Assuming that passing a fixed number of arguments is O(1), that will indeed be O(log(n)).
Note that this assumption is true for Java because you don't make a complete copy of the array before passing it. But it is important to be aware that it is not true in all languages. For example I recently did a bunch of work in PL/pgSQL, and there arrays are passed by value. Meaning that your algorithm would have been O(n log(n)).

Algorithmic complexity of generating Hamming numbers (not codes)

Hamming numbers are numbers of the form 2^a3^b5^c. If I want to generate the nth Hamming number, one way to do this is to use a min-heap and a hashset as in the following pseudo-code:
heap = [1]
seen = hashset()
answer = []
while len(answer) < n:
x = heap.pop()
answer.append(x)
for f in [2,3,5]:
if f*x not in seen:
heap.push(f*x)
seen.add(f*x)
return answer[-1]
I think this is O(n log n) in time complexity: each time we run the block of code in the while loop, we do one pop and up to three pushes, each of which is logarithmic in the size of the heap, and the size of the heap is at worse linear in the number of times we've performed the while loop.
Questions:
Is my timing analysis correct?
Is there an algorithm which can do this faster, i.e. linear time complexity in n? What is the best-case for time complexity for this problem?

Calculating median with a Black Box algorithm in O(n) time

the problem is this:
given an array A of size n and algorithm B and B(A,n)=b where b is an element of A such that |{1<=i<=n | a_i>b}|>=n/10
|{1<=i<=n | a_i>b}|<=n/10
The time complexity of B is O(n).
i need to find the median in O(n).
I tried solving this question by applying B and then finding the groups of elements that are smaller than b, lets name this group as C.
and the elements bigger than b, lets name this group D.
we can get groups C and D by traversing through array A in O(n).
now i can apply algorithm B on the smaller group from the above because the median is not there and applying the same principle in the end i can get the median element. time complexity O(nlogn)
i can't seem to find a solution that works at O(n).
this is a homework question and i would appreciate any help or insight.
You are supposed to use function B() to choose a pivot element for the Quickselect algorithm: https://en.wikipedia.org/wiki/Quickselect
It looks like you are already thinking of exactly this procedure, so you already have the algorithm, and you're just calculating the complexity incorrectly.
In each iteration, you run a linear time procedure on a list that is at most 9/10ths the size of the list in the previous iteration, so the worst case complexity is
O( n + n*0.9 + n*0.9^2 + n*0.9^3 ...)
Geometric progressions like this converge to a constant multiplier:
Let T = 1 + 0.9^1 + 0.9^2 + ...
It's easy to see that
T - T*0.9 = 1, so
T*(0.1) = 1, and T=10
So the total number of elements processed through all iterations is less than 10n, and your algorithm therefore takes O(n) time.

Big O for exponential complexity specific case

Let's an algorithm to find all paths between two nodes in a directed, acyclic, non-weighted graph, that may contain more than one edge between the same two vertices. (this DAG is just an example, please I'm not discussing this case specifically, so disregard it's correctness though it's correct, I think).
We have two effecting factors which are:
mc: max number of outgoing edges from a vertex.
ml: length of the max length path measured by number of edges.
Using an iterative fashion to solve the problem, where complexity in the following stands for count of processing operations done.
for the first iteration the complexity = mc
for the second iteration the complexity = mc*mc
for the third iteration the complexity = mc*mc*mc
for the (max length path)th iteration the complexity = mc^ml
Total worst complexity is (mc + mc*mc + ... + mc^ml).
1- can we say it's O(mc^ml)?
2- Is this exponential complexity?, as I know, in exponential complexity, the variable only appear at the exponent, not at the base.
3- Are mc and ml both are variables in my algorithm comlexity?
There's a faster way to achieve the answer in O(V + E), but seems like your question is about calculating complexity, not about optimizing algorithm.
Yes, seems like it's O(mc^ml)
Yes, they bot can be variables in your algorithm complexity
As about the complexity of your algorithm: let's do some transformation, using the fact that a^b = e^(b*ln(a)):
mc^ml = (e^ln(mc))^ml = e^(ml*ln(mc)) < e^(ml*mc) if ml,mc -> infinity
So, basically, your algorithm complexity upperbound is O(e^(ml*mc)), but we can still shorten it to see, if it's really an exponential complexity. Assume that ml, mc <= N, where N is, let's say, max(ml, mc). So:
e^(ml*mc) <= e^N^2 = e^(e^2*ln(N)) = (e^e)^(2*ln(N)) < (e^e)^(2*N) = [C = e^e] = O(C^N).
So, your algorithm complexity will be O(C^N), where C is a constant, and N is something that growth not faster than linear. So, basically - yes, it is exponetinal complexity.

Determine overall space complexity of the program

In a program, I’m using two data structures
1: An array of pointers of size k, each pointer points to a link lists(hence, total ‘k’ lists) . Total number of nodes in all the lists = M…..(something like hashing with separate chaining, k is fixed, M can vary)
2: Another array of integers of size M (where M=number of nodes above)
Question is: What is the overall space complexity of the program? Is it something like below?
First part: O(k+M) or just O(M)….both are correct I guess!
Second part: O(2M) or just O(M)…again both correct?
Overall O(k+M) + O(2M) ==> O(max(k+M, 2M)
Or just O(M)?
Please help.
O(K+M) is O(M) if the M is always greater than K. So, the final result is O(M).
First part: O(k+M) is not correct its just O(M)
Second part: O(2M) is not correct because we don't use constants in order so correct is O(M)
Overall O(M) + O(M) ==> O(M).
Both are correct in the two cases. But since O(k+M) = O(M), supposing k constant, everybody will use the simplest notation, which is O(M).
For the second part, a single array is O(M).
For the overall, it would be O(k+M+M) = O(max(k+M,2M)) = O(M) (we can "forget" multiplicative and additive constants in the big-O notation - except if your are in constant time).
As a reminder g(x) = O(f(x)) iff there exist x0 and c such that x>x0 implies g(x) >= c.f(x)

Resources