Summing an Array and Big O Notation - algorithm

How to find an algorithm for calculating the sum value in the array??
Is is Something like this?
Algorithm Array Sum
Input: nonnegative integer N, and array A[1],A[2],...,A[N]
Output: sum of the N integers in array A
Algorith Body:
j:=1
sum:=0
while j<N
sum := sum + a[J]
j:=j+1
end while
end Algorithm Array Sum
And how I can relate it with the running time of the algorithm by using O-Notation
This is the past year exam and I need to make revision for my exam.
Question
An Array A[] holding n integer value is given
1.Give an algorithm for calculating the sum of all the value in the array
2.Find the simplest and best O-notation for the running time of the algorithm.

The question is to find the sum of all the values so iterate through each element in the array and add each element to a temporary sum value.
temp_sum = 0
for i in 1 ...array.length
temp_sum = temp_sum + array[i]
Since you need to go through all the elements in the array, this program depends linearly to the number of elements. If you have 10 elements, iterate through 10 elements, if you have a million you have no choice other than to go through all the million elements and add each of them. Thus the time complexity is Θ(n).
If you are finding the sum of all the elements and you dont know any thing about the data then you need to look at all the elements at least once. Thus n is the lowerbound. You also need not look at the element more than once. n is also the upper bound. Hence the complexity is Θ(n).
However if you know something about the elements..say you get a sequency of n natural numbers, you can do it in constant time with n(n+1)/2. If the data you get are random then you have no choice but do the above linear time algorithm.

Since n is the size of array and all you have to do is iterate from begeinning to end the the Big O notation is O[n]
integer N= Size_array;
array a[N]
j=1
sum=0
while j<=N
sum += a[j]
j++
end while

I think that you meant "while j <= N", you need to specify this.
The running time shall be O(n), I think, as you have only one loop.

To calculate O for this algorithm you need to count the number of times each line of code executes. Later on you will only count the fundamental operations but start by counting all.
So how many times will the j := 1 line run? How many times will the sum := 0 run?
How many times will the while loop's condition execute? The statements inside the while loop?
Sum these all up. You will notice that the value you get will be something like 1 + 1 + n + n + n = 2 + 3n. thus you can conclude that it is a linear function on n.

Related

Time complexity on multiple variables & functions

I have written an algorithm to read in a text file and extract the contents inside into two array, then sort. The program is working but I am confuse at calculating the time complexity. Just needed someone to clarify on this.
Say I have two functions, a main and a helper.
Helper function
insertion(int array[], int length)
...
Main function
int main()
while(...) // this while loop read the input text file and push integer into vector
...
while(...)
...
if(...)
for(...) // this for loop validates array B only
insertion(arrayA, lengthA)
insertion(arrayB, lengthB)
Program read in text file
Push line 1 to array A, push line 2 to array B
'for loop' to validate array B array integers with an outer 'if'
Perform insertion sort on array A and array B
From what I learnt, I have to let number of data be 'n' before calculating the Big-O or number of operations. Now, obviously there are two data points here - one for array A and one for array B.
So, array A = n and array B = m.
However, I am unsure whether the number of data in the helper function should be using 'n' or 'm'. Likewise for the nested while loop, if the number of data should also be using 'n' or 'm'.
I tried my best to explain my difficulty in understanding this time complexity along with a simplified form of my program (the actual program has tons of loops...). Hopefully someone can understand what I mean and provide some clarification or else I will modify further to see if I can make it clearer. Thanks!
Edit: I am required to calculate the number of operations before finding the Big-O for my algorithm.
I understand that after you read the file, will have array A and B.
If m and n is close, then you can say that m = n. Otherwise, you choose the biggest one and say it is n.
Then you read n two time, n + n = 2, but in big O, you can take out the constant, then at this point you have O(n) time.
If validate only pass one time through your array B, then you say 3n of complexity time, but 3 still a constant, then time complexity still O(n).
But, the worse case insertion sort can do is O(n^2). You do it two time n^2 + n^ 2 = 2*n^2, two is a constant, so time of insertion sort peace takes O(n^2).
Finally, you have O(n) + O(n^2). Since it's big notation, the most cost part is the really significant part: O(n^2) is your complexity.
For example, if you use insertion sort n times, then you'd have O(n(n^2)) time, which is O(n^3).
The computer do 10^9 operation per second. So small n doesn't count so much.
If you not sure if n and m is close, let's says that 0 < n < 10^9 and 0 < m < 10^3. You'd say that time complexity of inputs is O(n+m). Then insertion sort O(n^2) + O(m^2). But still here, m << n (m is much less than n), you can equally not consider m (I'm saying m here is almost optional IF YOU'RE not being strict!). IF you need be strict, do not ignore at first this small cases.
If 0 < n < 10^9 and 0 < m < 10^9, then you should't say m = n, or ignore anyone. Because n can be one, and m one million.

find four integers that sum to a given value

Given a group of n integers - S that belongs to ℤ.
All the integers in S are in the range of 0 to ⌈nlgn⌉.
I need to find whether there are 4 integers in S whose sum equals to a given number X.
I know there is a solution of O(n2) using hashing, and there are some other solutions of O(n2lgn).
Is there any chance that this specific question can be solved in n2 time without using hashing?
If using hashing, is O(n2) the worst case, or its anticipation?
Pseudocode:
sums = array of vectors of size (n log n) * 2
for index1, num1 in numbers:
for index2, num2 in numbers:
sums[num1 + num2].append((index1, index2))
idx1 = 0
idx2 = x
while idx1 < idx2:
if any pair in sums[idx1] doesnt share an index with any pair of sums[idx2]:
any of those combinations generates the sum x
idx1 += 1
idx2 -= 1
As the numbers are in a small range, we don´t need hashing, we can use simple arrays with the sum as the index. To generate sums, we spend O(n^2) time. To then check if there is a sum, it is also O(n^2) because we never look at the candidates in sums[idx] more than once each and there are n^2 of those candidates. This part can probably be improved but won´t help for the overall complexity.

How many times variable m is updated

Given the following pseudo-code, the question is how many times on average is the variable m being updated.
A[1...n]: array with n random elements
m = a[1]
for I = 2 to n do
if a[I] < m then m = a[I]
end for
One might answer that since all elements are random, then the variable will be updated on average on half the number of iterations of the for loop plus one for the initialization.
However, I suspect that there must be a better (and possibly the only correct) way to prove it using binomial distribution with p = 1/2. This way, the average number of updates on m would be
M = 1 + Σi=1 to n-1[k.Cn,k.pk.(1-p)(n-k)]
where Cn,k is the binomial coefficient. I have tried to solve this but I have stuck some steps after since I do not know how to continue.
Could someone explain me which of the two answers is correct and if it is the second one, show me how to calculate M?
Thank you for your time
Assuming the elements of the array are distinct, the expected number of updates of m is the nth harmonic number, Hn, which is the sum of 1/k for k ranging from 1 to n.
The summation formula can also be represented by the recursion:
H1 &equals; 1
Hn &equals; Hn−1&plus;1/n (n > 1)
It's easy to see that the recursion corresponds to the problem.
Consider all permutations of n−1 numbers, and assume that the expected number of assignments is Hn−1. Now, every permutation of n numbers consists of a permutation of n−1 numbers, with a new smallest number inserted in one of n possible insertion points: either at the beginning, or after one of the n−1 existing values. Since it is smaller than every number in the existing series, it will only be assigned to m in the case that it was inserted at the beginning. That has a probability of 1/n, and so the expected number of assignments of a permutation of n numbers is Hn−1 + 1/n.
Since the expected number of assignments for a vector of length one is obviously 1, which is H1, we have an inductive proof of the recursion.
Hn is asymptotically equal to ln n &plus; γ where γ is the Euler-Mascheroni constant, approximately 0.577. So it increases without limit, but quite slowly.
The values for which m is updated are called left-to-right maxima, and you'll probably find more information about them by searching for that term.
I liked #rici answer so I decided to elaborate its central argument a little bit more so to make it clearer to me.
Let H[k] be the expected number of assignments needed to compute the min m of an array of length k, as indicated in the algorithm under consideration. We know that
H[1] = 1.
Now assume we have an array of length n > 1. The min can be in the last position of the array or not. It is in the last position with probability 1/n. It is not with probability 1 - 1/n. In the first case the expected number of assignments is H[n-1] + 1. In the second, H[n-1].
If we multiply the expected number of assignments of each case by their probabilities and sum, we get
H[n] = (H[n-1] + 1)*1/n + H[n-1]*(1 - 1/n)
= H[n-1]*1/n + 1/n + H[n-1] - H[n-1]*1/n
= 1/n + H[n-1]
which shows the recursion.
Note that the argument is valid if the min is either in the last position or in any the first n-1, not in both places. Thus we are using that all the elements of the array are different.

Sample number with equal probability which is not part of a set

I have a number n and a set of numbers S ∈ [1..n]* with size s (which is substantially smaller than n). I want to sample a number k ∈ [1..n] with equal probability, but the number is not allowed to be in the set S.
I am trying to solve the problem in at worst O(log n + s). I am not sure whether it's possible.
A naive approach is creating an array of numbers from 1 to n excluding all numbers in S and then pick one array element. This will run in O(n) and is not an option.
Another approach may be just generating random numbers ∈[1..n] and rejecting them if they are contained in S. This has no theoretical bound as any number could be sampled multiple times even if it is in the set. But on average this might be a practical solution if s is substantially smaller than n.
Say s is sorted. Generate a random number between 1 and n-s, call it k. We've chosen the k'th element of {1,...,n} - s. Now we need to find it.
Use binary search on s to find the count of the elements of s <= k. This takes O(log |s|). Add this to k. In doing so, we may have passed or arrived at additional elements of s. We can adjust for this by incrementing our answer for each such element that we pass, which we find by checking the next larger element of s from the point we found in our binary search.
E.g., n = 100, s = {1,4,5,22}, and our random number is 3. So our approach should return the third element of [2,3,6,7,...,21,23,24,...,100] which is 6. Binary search finds that 1 element is at most 3, so we increment to 4. Now we compare to the next larger element of s which is 4 so increment to 5. Repeating this finds 5 in so we increment to 6. We check s once more, see that 6 isn't in it, so we stop.
E.g., n = 100, s = {1,4,5,22}, and our random number is 4. So our approach should return the fourth element of [2,3,6,7,...,21,23,24,...,100] which is 7. Binary search finds that 2 elements are at most 4, so we increment to 6. Now we compare to the next larger element of s which is 5 so increment to 7. We check s once more, see that the next number is > 7, so we stop.
If we assume that "s is substantially smaller than n" means |s| <= log(n), then we will increment at most log(n) times, and in any case at most s times.
If s is not sorted then we can do the following. Create an array of bits of size s. Generate k. Parse s and do two things: 1) count the number of elements < k, call this r. At the same time, set the i'th bit to 1 if k+i is in s (0 indexed so if k is in s then the first bit is set).
Now, increment k a number of times equal to r plus the number of set bits is the array with an index <= the number of times incremented.
E.g., n = 100, s = {1,4,5,22}, and our random number is 4. So our approach should return the fourth element of [2,3,6,7,...,21,23,24,...,100] which is 7. We parse s and 1) note that 1 element is below 4 (r=1), and 2) set our array to [1, 1, 0, 0]. We increment once for r=1 and an additional two times for the two set bits, ending up at 7.
This is O(s) time, O(s) space.
This is an O(1) solution with O(s) initial setup that works by mapping each non-allowed number > s to an allowed number <= s.
Let S be the set of non-allowed values, S(i), where i = [1 .. s] and s = |S|.
Here's a two part algorithm. The first part constructs a hash table based only on S in O(s) time, the second part finds the random value k ∈ {1..n}, k ∉ S in O(1) time, assuming we can generate a uniform random number in a contiguous range in constant time. The hash table can be reused for new random values and also for new n (assuming S ⊂ { 1 .. n } still holds of course).
To construct the hash, H. First set j = 1. Then iterate over S(i), the elements of S. They do not need to be sorted. If S(i) > s, add the key-value pair (S(i), j) to the hash table, unless j ∈ S, in which case increment j until it is not. Finally, increment j.
To find a random value k, first generate a uniform random value in the range s + 1 to n, inclusive. If k is a key in H, then k = H(k). I.e., we do at most one hash lookup to insure k is not in S.
Python code to generate the hash:
def substitute(S):
H = dict()
j = 1
for s in S:
if s > len(S):
while j in S: j += 1
H[s] = j
j += 1
return H
For the actual implementation to be O(s), one might need to convert S into something like a frozenset to insure the test for membership is O(1) and also move the len(S) loop invariant out of the loop. Assuming the j in S test and the insertion into the hash (H[s] = j) are constant time, this should have complexity O(s).
The generation of a random value is simply:
def myrand(n, s, H):
k = random.randint(s + 1, n)
return (H[k] if k in H else k)
If one is only interested in a single random value per S, then the algorithm can be optimized to improve the common case, while the worst case remains the same. This still requires S be in a hash table that allows for a constant time "element of" test.
def rand_not_in(n, S):
k = random.randint(len(S) + 1, n);
if k not in S: return k
j = 1
for s in S:
if s > len(S):
while j in S: j += 1
if s == k: return j
j += 1
Optimizations are: Only generate the mapping if the random value is in S. Don't save the mapping to a hash table. Short-circuit the mapping generation when the random value is found.
Actually, the rejection method seems like the practical approach.
Generate a number in 1...n and check whether it is forbidden; regenerate until the generated number is not forbidden.
The probability of a single rejection is p = s/n.
Thus the expected number of random number generations is 1 + p + p^2 + p^3 + ... which is 1/(1-p), which in turn is equal to n/(n-s).
Now, if s is much less than n, or even more up to s = n/2, this expected number is at most 2.
It would take s almost equal to n to make it infeasible in practice.
Multiply the expected time by log s if you use a tree-set to check whether the number is in the set, or by just 1 (expected value again) if it is a hash-set. So the average time is O(1) or O(log s) depending on the set implementation. There is also O(s) memory for storing the set, but unless the set is given in some special way, implicitly and concisely, I don't see how it can be avoided.
(Edit: As per comments, you do this only once for a given set.
If, additionally, we are out of luck, and the set is given as a plain array or list, not some fancier data structure, we get O(s) expected time with this approach, which still fits into the O(log n + s) requirement.)
If attacks against the unbounded algorithm are a concern (and only if they truly are), the method can include a fall-back algorithm for the cases when a certain fixed number of iterations didn't provide the answer.
Similarly to how IntroSort is QuickSort but falls back to HeapSort if the recursion depth gets too high (which is almost certainly a result of an attack resulting in quadratic QuickSort behavior).
Find all numbers that are in a forbidden set and less or equal then n-s. Call it array A.
Find all numbers that are not in a forbidden set and greater then n-s. Call it array B. It may be done in O(s) if set is sorted.
Note that lengths of A and B are equal, and create mapping map[A[i]] = B[i]
Generate number t up to n-s. If there is map[t] return it, otherwise return t
It will work in O(s) insertions to a map + 1 lookup which is either O(s) in average or O(s log s)

How to justify the correctness and runtime of an algorithm

How do you go about justifying the correctness and runtime of an algorithm?
For example, say I'm asked to justify the correctness and runtime of an algorithm that is essentially counting sort.. I know it runs in worst case O(n), but idk how to justify the correctness or prove that the runtime is O(n).
Question: Describe an algorithm to sort n integers, each in the range [0..n4 − 1], in O(n) time. Justify the correctness and the running time of your algorithm.
Work:
Algorithm to sort n integers in the range [0..(n^4)-1] in O(n)
Use countSort with respect to the least significant digit
countSort with respect to the next least significant digit
Represent each int x in the list as its 4 digits in base n
let k = (n^4)-1 the max value in the range
Since values range from 0..k, create k+1 buckets
Iterate through the list and increment counter each time a value appears
Fill input list with the data from the buckets where each key is a value in the list
From smallest to largest key, add bucket index to the input array
Variables:
array: list of ints to be sorted
result: output array (indexes from 0..n-1)
n: length of the input
k: value such that all keys are in range 0..k-1
count: array of ints with indexes 0..k-1 (starts with all = 0)
x: single input value
total/oCount: control variables
total=0
for x in array
count[key of x] ++
for i < length of k-1
oCount = count[i]
count[i] = total
total += oCount
for x in array
result[count[key of x]] = x
count[key of x] ++
return result
The algorithm uses simple loops without recursion. Initializing the count array and the middle for loop that calculates sum on the count array iterate at most k+1 times. The 1 is constant so this takes O(k). Looping to initialize the result array and input array will take O(n) time. These total to O(n+k) time. k is considered a constant, so the the final running time is O(n)
I need some help to point me in the correct direction. Thanks!

Resources