How to calculate runtime complexity of the nested loops with variable length - algorithm

Suppose I have a task to write an algorithm that runs through an array of strings and checks if each value in the array contains s character. The algorithm will have two nested loops, here is the pseudo code:
for (let i=0; i < a.length; i++)
for (let j=0; j < a[i].length; j++)
if (a[i][j] === 'c')
do something
Now, the task is to identify the runtime complexity of the algorithm. Here is my reasoning:
let the number of elements in the array be n, while the maximum length of string values m. So the general formula for complexity is
n x m
Now the possible cases.
If the maximum length of string values is equal to the number of elements, I get the complexity:
n^2
If the maximum length of elements is less than the number of elements by some number a, the complexity is
n x (n - a) = n^2 - na
If the maximum length of elements is more than the number of elements by some number a, the complexity is
n x (n - a) = n^2 + na
Since we discard lower growth functions, it seems that the complexity of the algorithm is n^2. Is my reasoning correct?

Your time complexity is just the total number of characters. Which of the analyses is applicable, depends entirely on which of your assumptions about the relationship between the length of words, and the number of words, holds true. Note in particular, your statement that the time complexity is N x M where M is the largest name in the array, is not correct (it's correct in the sense that it places an upper bound, but that upper bound is not tight, so it's not very interesting; it's correct in the same sense that N^2 x M^2 is correct).
I think certainly in many real cases of interest, your analysis is incorrect. The total number of characters is equal to the number of strings, times the average number of characters per string, i.e. word length (note: average, not maximum!). As the number of strings becomes large, the average sample word length will approach the mean of whatever distribution you are sampling from. So at least for any well behaved distribution where the sampling is iid, the time complexity is simply N.
A good practical example is a database that stores names. It depends of course which people happen to be in the database, but if you are storing names of say American citizens, then as N becomes large, the number of inner operations will approach N times the average number of characters in a name, in the US. The latter quantity just doesn't depend on N at all, so it's linear in N.

Related

Parity of permutation with parallelism

I have an integer array of length N containing the values 0, 1, 2, .... (N-1), representing a permutation of integer indexes.
What's the most efficient way to determine if the permutation has odd or even parity, given I have parallel compute of O(N) as well?
For example, you can sum N numbers in log(N) with parallel computation. I expect to find the parity of permutations in log(N) as well, but cannot seem to find an algorithm. I also do not know how this "complexity order with parallel computation" is called.
The number in each array slot is the proper slot for that item. Think of it as a direct link from the "from" slot to the "to" slot. An array like this is very easy to sort in O(N) time with a single CPU just by following the links, so it would be a shame to have to use a generic sorting algorithm to solve this problem. Thankfully...
You can do this easily in O(log N) time with Ω(N) CPUs.
Let A be your array. Since each array slot has a single link out (the number in that slot) and a single link in (that slot's number is in some slot), the links break down into some number of cycles.
The parity of the permutation is the oddness of N-m, where N is the length of the array and m is the number of cycles, so we can get your answer by counting the cycles.
First, make an array S of length N, and set S[i] = i.
Then:
Repeat ceil(log_2(N)) times:
foreach i in [0,N), in parallel:
if S[i] < S[A[i]] then:
S[A[i]] = S[i]
A[i] = A[A[i]]
When this is finished, every S[i] will contain the smallest index in the cycle containing i. The first pass of the inner loop propagates the smallest S[i] to the next slot in the cycle by following the link in A[i]. Then each link is made twice as long, so the next pass will propagate it to 2 new slots, etc. It takes at most ceil(log_2(N)) passes to propagate the smallest S[i] around the cycle.
Let's call the smallest slot in each cycle the cycle's "leader". The number of leaders is the number of cycles. We can find the leaders just like this:
foreach i in [0,N), in parallel:
if (S[i] == i) then:
S[i] = 1 //leader
else
S[i] = 0 //not leader
Finally, we can just add up the elements of S to get the number of cycles in the permutation, from which we can easily calculate its parity.
You didn't specify a machine model, so I'll assume that we're working with an EREW PRAM. The complexity measure you care about is called "span", the number of rounds the computation takes. There is also "work" (number of operations, summed over all processors) and "cost" (span times number of processors).
From the point of view of theory, the obvious answer is to modify an O(log n)-depth sorting network (AKS or Goodrich's Zigzag Sort) to count swaps, then return (number of swaps) mod 2. The code is very complex, and the constant factors are quite large.
A more practical algorithm is to use Batcher's bitonic sorting network instead, which raises the span to O(log2 n) but has reasonable constant factors (such that people actually use it in practice to sort on GPUs).
I can't think of a practical deterministic algorithm with span O(log n), but here's a randomized algorithm with span O(log n) with high probability. Assume n processors and let the (modifiable) input be Perm. Let Coin be an array of n Booleans.
In each of O(log n) passes, the processors do the following in parallel, where i ∈ {0…n-1} identifies the processor, and swaps ← 0 initially. Lower case variables denote processor-local variables.
Coin[i] ← true with probability 1/2, false with probability 1/2
(barrier synchronization required in asynchronous models)
if Coin[i]
j ← Perm[i]
if not Coin[j]
Perm[i] ← Perm[j]
Perm[j] ← j
swaps ← swaps + 1
end if
end if
(barrier synchronization required in asynchronous models)
Afterwards, we sum up the local values of swaps and mod by 2.
Each pass reduces the number of i such that Perm[i] ≠ i by 1/4 of the current total in expectation. Thanks to the linearity of expectation, the expected total is at most n(3/4)r, so after r = 2 log4/3 n = O(log n) passes, the expected total is at most 1/n, which in turn bounds the probability that the algorithm has not converged to the identity permutation as required. On failure, we can just switch to the O(n)-span serial algorithm without blowing up the expected span, or just try again.

Calculating bigO runtime with 2D values, where one dimension has unknown length

I was working on the water-collection between towers problem, and trying to calculate the bigO of my solution for practice.
At one point, I build a 2D array of 'towers' from the user's input array of heights. This step uses a nested for loop, where the inner loop runs height many times.
Is my BigO for this step then n * maxHeight?
I've never seen any sort of BigO that used a variable like this, but then again I'm pretty new so that could be an issue of experience.
I don't feel like the height issue can be written off as a constant, because there's no reason that the height of the towers wouldn't exceed the nuber of towers on a regular basis.
//convert towerArray into 2D array representing towers
var multiTowerArray = [];
for (i = 0; i < towerArray.length; i++) {
//towerArray is the user-input array of tower heights
multiTowerArray.push([]);
for (j = 0; j < towerArray[i]; j++) {
multiTowerArray[i].push(1);
}
}
For starters, it's totally reasonable - and not that uncommon - to give the big-O runtime of a piece of code both in terms of the number of elements in the input as well as the size of the elements in the input. For example, counting sort runs in time O(n + U), where n is the number of elements in the input array and U is the maximum value in the array. So yes, you absolutely could say that the runtime of your code is O(nU), where n is the number of elements and U is the maximum value anywhere in the array.
Another option would be to say that the runtime of your code is O(n + S), where S is the sum of all the elements in the array, since the aggregate number of times that the inner loop runs is equal to the sum of the array elements.
Generally speaking, you can express the runtime of an algorithm in terms of whatever quantities you'd like. Many graph algorithms have a runtime that depends on both number of nodes (often denoted n) and the number of edges (often denoted m), such as Dijkstra's algorithm, which can be made to run in time O(m + n log n) using a Fibonacci heap. Some algorithms have a runtime that depends on the size of the output (for example, the Aho-Corasick string matching algorithm runs in time O(m + n + z), where m and n are the lengths of the input strings and z is the number of matches). Some algorithms depend on a number of other parameters - as an example, the count-min sketch performs updates in time O(ε-1 log δ-1), where ε and δ are parameters specified when the algorithm starts.

Big-O complexity of calculation: Drawing a non-colliding subset of k elements from n total with dumb algorithm

I'm trying to understand the computational complexity of this pseudocode:
values is a set of n unique elements
subset is an empty set
for 0 ... k
X: randomly select a value from values
if value is in subset
goto X
else
insert value into subset
This is of course a (poor) algorithm for selecting a unique random subset of k elements from n, and I'm aware of the better choices, but I wanted to understand the computational complexity of this.
I can see easily that this is O(n) when duplicates are allowed because the conditional test is eliminated from the pseudocode and k choices are made each time.
When you have to account for duplicates, there is a probability that a re-test will be required which increases with each iteration. Depending on the values of n and k, this is a non-negligible fact, but I'm not certain how it affects the big-O complexity in a generalized way. Could someone explain this to me?
The probability of inserting value into subset for each value of k is (n-k)/n
The number of iterations of each k loop would be inversely proportional to that probability
Therefore Big O notation for each value of k would be O((n/(n-K)) + 1) where 1 would be 'insert value into subset'.
You have to calculate the summation of ((n/(n-K)) + 1) for each value of k ----final answer would be
O(((n/(n-K)) + 1) for k from 1 through k)
Disclaimer - this is assuming if Big(o) is applicable for functions that use random number generating algorithms (since X is random)

Why is counting sort not used for large inputs? [duplicate]

This question already has answers here:
Why we can not apply counting sort to general arrays?
(3 answers)
Closed 8 years ago.
Counting sort is the sorting algorithm with a average time complexity of O(n+K), and the counting sort assumes that each of the input element is an integer in the range of 0 to K.
Why can't we linear-search the maximum value in an unsorted array, equal it to K, and hence apply counting sort on it?
In the case where your inputs are arrays with maximum - minimum = O(n log n) (i.e. the range of values is reasonably restricted), this actually makes sense. If this is not the case, a standard comparison-based sort algorithm or even an integer sorting algorithm like radix sort is asymptotically better.
To give you an example, the following algorithm generates a family of inputs on which counting sort has runtime complexity Θ(n^2):
def generate_input(n):
array = []
for i := 1 to n:
array.append(i*i);
shuffle(array)
return array
Your heading of the question is Why is counting sort not used for large inputs?
What we do in counting sort? We take another array (suppose b[]) and initialize all element to zero. Then we increment an index if that index is an element of the given array. Then we run a loop from lower limit to upper limit of the given array and check if element of index of my taken array (b[]) is 0 or not. If it is not zero, that means, that index is an element of given array.
Now, If the difference between this two (upper limit & lower limit) is very high(like 10^9 or more), then a single loop is enough to kill our PC. :)
According to Big-O notation definition, if we say f(n) ∈ O(g(n)), it means that there is a value C > 0 and n = N such that f(n) < C*g(n), where C and N are constants. Nothing is said about the value of C nor for which n = N the inequality is true.
In any the algorithm analysis, the cost of each operation of the Turing machine must be considered (compare, move, sum, etc). The value of such costs are the defining factors of how big (or small) the values of C and N must be in order to turn the inequality true or false. Remove these cost is a naive assumption I myself used to do during the algorithm analysis course.
The statement "counting sort is O(n+k)" actually means that the sorting is polynomial and linear for a given C, n > N,n > K, where C, N, and K are constants. Thus other algorithms may have a better performance for smaller inputs, because the inequality is true only if the given conditions are true.

random merge sort

I was given the following question in an algorithms book:
Suppose a merge sort is implemented to split a file at a random position, rather then exactly in the middle. How many comparisons would be used by such method to sort n elements on average?
Thanks.
To guide you to the answer, consider these more specific questions:
Assume the split is always at 10%, or 25%, or 75%, or 90%. In each case: what's the impact on recursion depths? How many comparisons need to be per recursion level?
I'm partially agree with #Armen, they should be comparable.
But: consider the case when they are split in the middle. To merge two lists of lengths n we would need 2*n - 1 comparations (sometimes less, but we'll consider it fixed for simplicity), each of them producing the next value. There would be log2(n) levels of merges, that gives us approximately n*log2(n) comparations.
Now considering the random-split case: The maximum number of comparations needed to merge a list of length n1 with one of length n2 will be n1 + n2 - 1. Howerer, the average number will be close to it, because even for the most unhappy split 1 and n-1 we'll need an average of n/2 comparations. So we can consider that the cost of merging per level will be the same as in even case.
The difference is that in random case the number of levels will be larger, and we can consider that n for next level would be max(n1, n2) instead of n/2. This max(n1, n2) will tend to be 3*n/4, that gives us the approximate formula
n*log43(n) // where log43 is log in base 4/3
that gives us
n * log2(n) / log2(4/3) ~= 2.4 * n * log2(n)
This result is still larger than the correct one because we ignored that the small list will have fewer levels, but it should be close enough. I suppose that the correct answer will be the number of comparations on average will double
You can get an upper bound of 2n * H_{n - 1} <= 2n ln n using the fact that merging two lists of total length n costs at most n comparisons. The analysis is similar to that of randomized quicksort (see http://www.cs.cmu.edu/afs/cs/academic/class/15451-s07/www/lecture_notes/lect0123.pdf).
First, suppose we split a list of length n into 2 lists L and R. We will charge the first element of R for a comparison against all of the elements of L, and the last element of L for a comparison against all elements of R. Even though these may not be the exact comparisons that are executed, the total number of comparisons we are charging for is n as required.
This handles one level of recursion, but what about the rest? We proceed by concentrating only on the "right-to-left" comparisons that occur between the first element of R and every element of L at all levels of recursion (by symmetry, this will be half the actual expected total). The probability that the jth element is compared to the ith element is 1/(j - i) where j > i. To see this, note that element j is compared with element i exactly when it is the first element chosen as a "splitting element" from among the set {i + 1,..., j}. That is, elements i and j are split into two lists as soon as the list they are in is split at some element from {i + 1,..., j}, and element j is charged for a comparison with i exactly when element j is the element that is chosen from this set.
Thus, the expected total number of comparisons involving j is at most H_n (i.e., 1 + 1/2 + 1/3..., where the number of terms is at most n - 1). Summing across all possible j gives n * H_{n - 1}. This only counted "right-to-left" comparisons, so the final upper bound is 2n * H_{n - 1}.

Resources