Complexity for binary search in a non random access structure - algorithm

Performing binary search on a sorted array has O(logN) complexity where N is the number of elements in the array.
But if we perform binary searches in a sorted (linked) list then what is the complexity?
We are doing logN comparisons of the middle element of the range but to get to the range the complexity is O(N) due to the fact that the list is not a random access structure.
So is the time complexity:
1) logN * O(N) = O(N) treating logN as a constant? or
2) logN*O(N) = O(NlogN) meaning that logN = O(logN) in all cases?
What is correct here? 1 or 2?

The second assumption is correct and first is wrong. Asymptotic analysis deals with growth. If the number of nodes increase, log(n) would also increase. You can't treat it as a constant. For a very basic example, if you had 10 nodes and it took 10 seconds to execute, assuming 100 nodes take 200 seconds to execute seems better more accurate than assuming 100 seconds (by neglecting log(n)).

Related

Quicksort Analysis

Question:
Here's a modification of quick sort: whenever we have ten items or fewer in a sublist, we sort the sublist using selection sort rather than further recursing with quicksort. Does this change the big-oh time complexity of quicksort? Explain.
In my opinion the big-oh time complexity would change. We know that selection sort is O(n^2) and therefore sorting the sublist of ten items or fewer would take O(n^2). Until we get to a sublist that has ten or fewer items we would use quicksort and keep partitioning the list. So in the end we would have O( nlogn + n^2) which is O(n^2).
Am I correct? If not, could someone explain why?
The reason that the time complexity is actually unaffected is that 10 is a constant term. No matter how large the total array is, it always takes a constant amount of time to sort subarrays of size 10 or less. If you are sorting a list with one million elements, that constant 10 is going to play a very small role in the actual time it takes to sort the list (most of the time will be spent partitioning the original array into subarrays recursively).
If sorting a list of 10 elements takes constant time, partitioning the array at each recursive step is linear, and you end up with log n subarrays of 10 items or fewer, you end up with O(n log n + log n), which is the same as O(n log n).
Saying that selection sort is O(n^2) means that the running time of the algorithm increases quadratically with the size of the input. Running selection sort on an array with a constant number of elements will always take constant time on its own, but if you were to compare the running time of selection sort on arrays of varying sizes, you would see a quadratic increase in the running time as input size varies linearly.
The big O complexity does not change. Please read up on the Master Method (aka Master Theorem) https://en.wikipedia.org/wiki/Master_theorem
If you think through the algorithm as the size of the sorting list grows exceptionally large the time to sort the final ten in any given recursion substree will make insignificant contributions to overall running time.

Is n operations of O(1) average time each considered O(n) in average?

I'm studying to data structures exam and I'm trying to solve this question:
given an array of n numbers and a number Z, find x,y such as x+y=Z , in O(n) average time.
My suggestion is move the array's content to a hash table, and using open addressing do the following:
For each number A[i] search for Z-A[i] in the hash table (O(1) in average for each operation.) Worst case you'll perform n searches, O(1) average time each, that's O(n) in average.
Is my analysis correct?
Given that you are traversing all your array the second time, yes that is O(n) * O(1) (and not O(n)+O(1) as previously stated from me) (for hash lookup in average time), so you are talking about an algorithm of O(n) complexity .

Amortized Analysis of Algorithms

I am currently reading amortized analysis. I am not able to fully understand how it is different from normal analysis we perform to calculate average or worst case behaviour of algorithms. Can someone explain it with an example of sorting or something ?
Amortized analysis gives the average performance (over time) of each operation in
the worst case.
In a sequence of operations the worst case does not occur often in each operation - some operations may be cheap, some may be expensive Therefore, a traditional worst-case per operation analysis can give overly pessimistic bound. For example, in a dynamic array only some inserts take a linear time, though others - a constant time.
When different inserts take different times, how can we accurately calculate the total time? The amortized approach is going to assign an "artificial cost" to each operation in the sequence, called the amortized cost of an operation. It requires that the total real cost of the sequence should be bounded by the total of the amortized costs of all the operations.
Note, there is sometimes flexibility in the assignment of amortized costs.
Three methods are used in amortized analysis
Aggregate Method (or brute force)
Accounting Method (or the banker's method)
Potential Method (or the physicist's method)
For instance assume we’re sorting an array in which all the keys are distinct (since this is the slowest case, and takes the same amount of time as when they are not, if we don’t do anything special with keys that equal the pivot).
Quicksort chooses a random pivot. The pivot is equally likely to be the smallest key,
the second smallest, the third smallest, ..., or the largest. For each key, the
probability is 1/n. Let T(n) be a random variable equal to the running time of quicksort on
n distinct keys. Suppose quicksort picks the ith smallest key as the pivot. Then we run quicksort recursively on a list of length i − 1 and on a list of
length n − i. It takes O(n) time to partition and concatenate the lists–let’s
say at most n dollars–so the running time is
Here i is a random variable that can be any number from 1 (pivot is the
smallest key) to n (pivot is largest key), each chosen with probability 1/n,
so
This equation is called a recurrence. The base cases for the recurrence are T(0) = 1 and T(1) = 1. This means that sorting a list of length zero or one takes at most one dollar (unit of time).
So when you solve:
The expression 1 + 8j log_2 j might be an overestimate, but it doesn’t
matter. The important point is that this proves that E[T(n)] is in O(n log n).
In other words, the expected running time of quicksort is in O(n log n).
Also there’s a subtle but important difference between amortized running time
and expected running time. Quicksort with random pivots takes O(n log n) expected running time, but its worst-case running time is in Θ(n^2). This means that there is a small
possibility that quicksort will cost (n^2) dollars, but the probability that this
will happen approaches zero as n grows large.
Quicksort O(n log n) expected time
Quickselect Θ(n) expected time
For a numeric example:
The Comparison Based Sorting Lower Bound is:
Finally you can find more information about quicksort average case analysis here
average - a probabilistic analysis, the average is in relation to all of the possible inputs, it is an estimate of the likely run time of the algorithm.
amortized - non probabilistic analysis, calculated in relation to a batch of calls to the algorithm.
example - dynamic sized stack:
say we define a stack of some size, and whenever we use up the space, we allocate twice the old size, and copy the elements into the new location.
overall our costs are:
O(1) per insertion \ deletion
O(n) per insertion ( allocation and copying ) when the stack is full
so now we ask, how much time would n insertions take?
one might say O(n^2), however we don't pay O(n) for every insertion.
so we are being pessimistic, the correct answer is O(n) time for n insertions, lets see why:
lets say we start with array size = 1.
ignoring copying we would pay O(n) per n insertions.
now, we do a full copy only when the stack has these number of elements:
1,2,4,8,...,n/2,n
for each of these sizes we do a copy and alloc, so to sum the cost we get:
const*(1+2+4+8+...+n/4+n/2+n) = const*(n+n/2+n/4+...+8+4+2+1) <= const*n(1+1/2+1/4+1/8+...)
where (1+1/2+1/4+1/8+...) = 2
so we pay O(n) for all of the copying + O(n) for the actual n insertions
O(n) worst case for n operation -> O(1) amortized per one operation.

analyzing time complexity

I have 2 arrays
a of length n
b of length m
Now i want to find all elements that are common to both the arrays
Algorithm:
Build hash map consisting of all elements of A
Now for each element x in B check if the element x is present in hashmap.
Analyzing overall time complexity
for building hashmap O(n)
for second step complexity is O(m)
So the overall is O(m+n). Am i correct?
What is O(m+n) = ?? when m is large or vice versa?
O(m) + O(n) = O(m+n), if you know that m>n then O(m+n)=O(m+m)=O(m).
Note: hashes theoretically don't guarantee O(1) lookup, but practically you can count on it (= it's the average complexity, the expected runtime for a random input).
Also note, that your algo will repeatedly signal duplicated elements of b which are also present in a. If this is a problem you have to store in the hash that you already checked/printed out that element.
Average case time complexity is O(m + n). This is what you should consider if you are doing some implementation, since hash maps would usually not have collisions. O(m+n) = O(max(m, n))
However, if this is an test question, by time complexity, people mean worst case time complexity. Worst case time complexity is O(mn) since each of second steps can take O(n) time in worst case.

Insertion Sort / Heap Sort time complexity

Assume you have to sort an array with n = 1,000,000 elements. How long would insert sort and heapsort roughly need assuming each basic step takes one milli-second?
I know that insert sort takes n^2 steps in the worst case, and heapsort takes n log n steps in the worst case.
So 1,000,000 ^ 2 for insertion sort = 1*10^12 milli-seconds
and 1,000,000 * log(1,000,000) for heap-sort? 6,000,000 milli-seconds
is that correct?
Well...
The problem is that "order" notation is only talking about limits and comparisons, not absolute times. It also leaves off constants and lower order terms.
For example (this is totally fictitious), the actual running time for the specific insertion sort implementation you might be looking at could be:
num steps = 45,334 * n^2 + 6,500,000 * n + 2,000,000
That is an O(n^2) algorithm, but it'll take a lot more time than what you've computed.

Resources