Time complexity of while loop until list is empty? - performance

My pseudo code is currently constructed as:
WHILE list is not empty
FOR item in list
do stuff to find an item that matches certain criteria
REMOVE matching item from list
I am having a hard time wrapping my mind about what the time complexity will be due to the fact that after each iteration the list will be getting smaller. Any thoughts? I thought o(n) since there is only 1 for loop per while iteration.

Be careful. o(n) and O(n) are very different.
“Only one for loop per iteration” - that one loop is not constant time, so your argument is wrong.
You seem to assume there is always a “matching item” that can be removed. If that isn’t true then your algorithm is fatally flawed.
Unless the criteria to be matched changes when an item is removed, you can make one pass through the loop without starting the search again at the beginning, so this can be done in O(n) where n is the size of the initial list.

Related

Quick sort where the first element in the array is smaller

Consider an array of elements {10,5,20,15,25,22,21}.
Here, I take the pivot element as 21 (last in array). According to the most of the Quick sort algorithms I saw on Internet, they explained starting the first element is compared with the pivot element. If it is smaller, it gets swapped with the index element. But the algorithm breaks on having first small element in the array making me difficult to write down the intermediate steps the quick sort will go through.
All the guys on the Internet explained with example of array having the the first element greater than pivot, thus on comparing they didn't swap and moved to next element.
Please help.
My suggestion on how to understand the quick sort:
The key to understand quick sort is the partition procedure, which is usually a for loop. Keep in mind that:
our goal is to make the array finally become an array consisting of three parts at the end of the loop: the first part is smaller than the pivot, the second part is equal to or larger than the pivot, the last part is the unsorted part(which has no element).
at the very beginning of the loop, we also have three parts: the first part(which has no element) is smaller than the pivot, the second part(which has no element) is equal to or larger than the pivot, the last part is the unsorted part(which has array.length-1 elements).
During the loop, we compare and swap if needed, to ensure that in each loop, we always have those three parts, and the size of the first and second parts are increasing, and the size of the last parts is decreasing.
On your request in the comment:
Check this link: https://www.cs.rochester.edu/~gildea/csc282/slides/C07-quicksort.pdf
Read the three example figures VERY carefully and make sure you have understood them.

Time Complexity of searching

there is a sorted array which is of very large size. every element is repeated more than once except one element. how much time will it take to find that element?
Options are:
1.O(1)
2.O(n)
3.O(logn)
4.O(nlogn)
The answer to the question is O(n) and here's why.
Let's first summarize the knowledge we're given:
A large array containing elements
The array is sorted
Every item except for one occurs more than once
Question is what is the time growth of searching for that one item that only occurs once?
The sorted property of the array, can we use this to speed up the search for the item? Yes, and no.
First of all, since the array isn't sorted by the property we must use to look for the item (only one occurrence) then we cannot use the sorted property in this regard. This means that optimized search algorithms, such as binary search, is out.
However, we know that if the array is sorted, then all items that have the same value will be grouped together. This means that when we look at an item we see for the first time we only have to compare it to the following item. If it's different, we've found the item we're looking for.
"see for the first time" is important, otherwise we would pick the first value since there will be a boundary between two groups of items where the two items are different.
So we have to move from one end of the array to the other, and compare each item to the following item, and this is an O(n) operation.
Basically, since the array isn't sorted by the property we're looking at, we're back to a linear search.
Must be O(n).
The fact that it's sorted doesn't help. Suppose you tried a binary method, jumping into the middle somewhere. You see that the value there has a neighbour that is the same. Now which half do you go to?
How would you write a program to find the value? You'd start at one end an check for an element whose neighbour is not the same. You'd have to walk the whole array until you found the value. So O(n)

Circular Linked list and the iterator

I have a question in my algorithm class in data structures.
For which of the following representations can all basic queue operations be performed in constant worst-case time?
To perform constant worst case time for the circular linked list, where should I have to keep the iterator?
They have given two choices:
Maintain an iterator that corresponds to the first item in the list
Maintain an iterator that corresponds to the last item in the list.
My answer is that to get the worst case time we should maintain the iterator that correspond to the last item in the list but I don't know how to justify and explain. So what are important points needed for this answer justification.
For which of the following representations can all basic queue operations be performed in constant worst-case time?
My answer is that to get the worst case time we should maintain the iterator that correspond to the last item
Assuming that your circular list is singly-linked, and that "the last item" in the circular list is the one that has been inserted the latest, your answer is correct *. In order to prove that you are right, you need to demonstrate how to perform these four operations in constant time:
Get the front element - Since the queue is circular and you have an iterator pointing to the latest inserted element, the next element from the latest inserted is the front element (i.e. the earliest inserted).
Get the back element - Since you maintain an iterator pointing to the latest inserted element, getting the back of the queue is a matter of dereferencing the iterator.
Enqueue - This is a matter of inserting after the iterator that you hold, and moving the iterator to the newly inserted item.
Dequeue - Copy the content of the front element (described in #1) into a temporary variable, re-point the next link of the latest inserted element to that of the front element, and delete the front element.
Since none of these operations require iterating the list, all of them can be performed in constant time.
* With doubly-linked circular lists both answers would be correct.

Finding the repeated element

In an array with integers between 1 and 1,000,000 or say some very larger value ,if a single value is occurring twice twice. How do you determine which one?
I think we can use a bitmap to mark the elements , and then traverse allover again to find out the repeated element . But , i think it is a process with high complexity.Is there any better way ?
This sounds like homework or an interview question ... so rather than giving away the answer, here's a hint.
What calculations can you do on a range of integers whose answer you can determine ahead of time?
Once you realize the answer to this, you should be able to figure it out .... if you still can't figure it out ... (and it's not homework) I'll post the solution :)
EDIT: Ok. So here's the elegant solution ... if the list contains ALL of the integers within the range.
We know that all of the values between 1 and N must exist in the list. Using Guass' formula we can quickly compute the expected value of a range of integers:
Sum(1..N) = 1/2 * (1 + N) * Count(1..N).
Since we know the expected sum, all we have to do is loop through all the values and sum their values. The different between this sum and the expected sum is the duplicate value.
EDIT: As other's have commented, the question doesn't state that the range contains all of the integers ... in this case, you have to decide whether you want to optimize for memory or time.
If you want to perform the operation using O(1) storage, you can perform an in-place sort of the list. As you're sorting you have to check adjacent elements. Once you see a duplicate, you know you can stop. Optimal sorting is an O(n log n) operation on average - which establishes an upper bound for find the duplicate in this manner.
If you want to optimize for speed, you can use an additional O(n) storage. Using a HashSet (or similar structure), insert values from your list until you determine you are inserting a duplicate into the HashSet. Inserting n items into a HashSet is an O(n) operation on average, which establishes that as an upper bound for this method.
you may try to use bits as hashmap:
1 at position k means that number k occured before
0 at position k means that number k did not occured before
pseudocode:
0. assume that your array is A
1. initialize bitarray(there is nice class in c# for this) of 1000000 length filled with zeros
2. for each num in A:
if bitarray[num]
return num
else
bitarray[num] = 1
end
The time complexity of the bitmap solution is O(n) and it doesn't seem like you could do better than that. However it will take up a lot of memory for a generic list of numbers. Sorting the numbers is an obvious way to detect duplicates and doesn't require extra space if you don't mind the current order changing.
Assuming the array is of length n < N (i.e. not ALL integers are present -- in this case LBushkin's trick is the answer to this homework problem), there is no way to solve this problem using less than O(n) memory using an algorithm that just takes a single pass through the array. This is by reduction to the set disjointness problem.
Suppose I made the problem easier, and I promised you that the duplicate elements were in the array such that the first one was in the first n/2 elements, and the second one was in the last n/2 elements. Now we can think of playing a game in which two people each hold a string of n/2 elements, and want to know how many messages they have to send to be sure that none of their elements are the same. Since the first player could simulate the run of any algorithm that takes a pass through the array, and send the contents of its memory to the second player, a lower bound on the number of messages they need to send implies a lower bound on the memory requirements of any algorithm.
But its easy to see in this simple game that they need to send n/2 messages to be sure that they don't hold any of the same elements, which yields the lower bound.
Edit: This generalizes to show that for algorithms that make k passes through the array and use memory m, that m*k = Omega(n). And it is easy to see that you can in fact trade off memory for time in this way.
Of course, if you are willing to use algorithms that don't simply take passes through the array, you can do better as suggested already: sort the array, then take 1 pass through. This takes time O(nlogn) and space O(1). But note curiously that this proves that any sorting algorithm that just makes passes through the array must take time Omega(n^2)! Sorting algorithms that break the n^2 bound must make random accesses.

Find a common element within N arrays

If I have N arrays, what is the best(Time complexity. Space is not important) way to find the common elements. You could just find 1 element and stop.
Edit: The elements are all Numbers.
Edit: These are unsorted. Please do not sort and scan.
This is not a homework problem. Somebody asked me this question a long time ago. He was using a hash to solve the problem and asked me if I had a better way.
Create a hash index, with elements as keys, counts as values. Loop through all values and update the count in the index. Afterwards, run through the index and check which elements have count = N. Looking up an element in the index should be O(1), combined with looping through all M elements should be O(M).
If you want to keep order specific to a certain input array, loop over that array and test the element counts in the index in that order.
Some special cases:
if you know that the elements are (positive) integers with a maximum number that is not too high, you could just use a normal array as "hash" index to keep counts, where the number are just the array index.
I've assumed that in each array each number occurs only once. Adapting it for more occurrences should be easy (set the i-th bit in the count for the i-th array, or only update if the current element count == i-1).
EDIT when I answered the question, the question did not have the part of "a better way" than hashing in it.
The most direct method is to intersect the first 2 arrays and then intersecting this intersection with the remaining N-2 arrays.
If 'intersection' is not defined in the language in which you're working or you require a more specific answer (ie you need the answer to 'how do you do the intersection') then modify your question as such.
Without sorting there isn't an optimized way to do this based on the information given. (ie sorting and positioning all elements relatively to each other then iterating over the length of the arrays checking for defined elements in all the arrays at once)
The question asks is there a better way than hashing. There is no better way (i.e. better time complexity) than doing a hash as time to hash each element is typically constant. Empirical performance is also favorable particularly if the range of values is can be mapped one to one to an array maintaining counts. The time is then proportional to the number of elements across all the arrays. Sorting will not give better complexity, since this will still need to visit each element at least once, and then there is the log N for sorting each array.
Back to hashing, from a performance standpoint, you will get the best empirical performance by not processing each array fully, but processing only a block of elements from each array before proceeding onto the next array. This will take advantage of the CPU cache. It also results in fewer elements being hashed in favorable cases when common elements appear in the same regions of the array (e.g. common elements at the start of all arrays.) Worst case behaviour is no worse than hashing each array in full - merely that all elements are hashed.
I dont think approach suggested by catchmeifyoutry will work.
Let us say you have two arrays
1: {1,1,2,3,4,5}
2: {1,3,6,7}
then answer should be 1 and 3. But if we use hashtable approach, 1 will have count 3 and we will never find 1, int his situation.
Also problems becomes more complex if we have input something like this:
1: {1,1,1,2,3,4}
2: {1,1,5,6}
Here i think we should give output as 1,1. Suggested approach fails in both cases.
Solution :
read first array and put into hashtable. If we find same key again, dont increment counter. Read second array in same manner. Now in the hashtable we have common elelements which has count as 2.
But again this approach will fail in second input set which i gave earlier.
I'd first start with the degenerate case, finding common elements between 2 arrays (more on this later). From there I'll have a collection of common values which I will use as an array itself and compare it against the next array. This check would be performed N-1 times or until the "carry" array of common elements drops to size 0.
One could speed this up, I'd imagine, by divide-and-conquer, splitting the N arrays into the end nodes of a tree. The next level up the tree is N/2 common element arrays, and so forth and so on until you have an array at the top that is either filled or not. In either case, you'd have your answer.
Without sorting and scanning the best operational speed you'll get for comparing 2 arrays for common elements is O(N2).

Resources