Determining running time of an algorithm to compare two arrays - algorithm

I want to know how it is possible to determine the run time of an algorithm written in pseudocode so that I can familiarize myself with run time. So for example, how do you know what the run time of an algorithm that will compare 2 arrays to determine if they are not the same?
Array 1 = [1, 5, 3, 2, 10, 12] Array 2 = [3, 2, 1, 5, 10, 12]
So these two arrays are not the same since they are ordered differently.
My pseudocode is:
1) set current pointer to first number in first array
2) set second pointer to first number in second array
3) while ( current pointer != " ") compare with same position element in other array
4) if (current pointer == second pointer)
move current pointer to next number
move second pointer to next number
5) else (output that arrays are not the same)
end loop
So I am assuming first off my code is correct. I know step 4 executes only once since it only takes 1 match to display arrays are not the same. So step 4 takes only constant time (1). I know step 1 and 2 only execute once also.
so far I know run time is 3 + ? (? being the run time of loop itself)
Now I am lost on the loop part. Does the loop run n times (n being number of numbers in the array?), since worst case might be every single number gets matched? Am I thinking of run time in the right way?
If someone can help with this, I'll appreciate it.
Thanks!

What you are asking about is called the time-complexity of your algorithm. We talk about the time complexity of algorithms using so called Big-O notation.
Big-O notation is a method for talking about the approximate number of steps our algorithms take relative to the size of the algorithms input, in the worst possible case for an input of that size.
Your algorithm runs in O(n) time (pronounced "big-oh of n" or "order n" or sometimes we just say "linear time").
You already know that steps 1,2, and 4 all run in a constant number of steps relative to the size of the array. We say that those steps run in O(1) time ("constant time").
So let's consider step 3:
If there are n elements in the array, then step 3 needs to do n comparisons in the worst case. So we say that step 3 takes O(n) time.
Since the algorithm takes O(n) time on step 3, and all other steps are faster, we say that the total time complexity of your algorithm is O(n).
When we write O(f), where f is some function, we mean that the algorithm runs within some constant factor of f for large values.
Take your algorithm for example. For large values of n (say n = 1000), the algorithm doesn't take exactly n steps. Suppose that a comparison takes 5 instructions to complete in your algorithm, on your machine of choice. (It could be any constant number, I'm just choosing 5 for example.) And suppose that steps 1, 2, 4 all take some constant number of steps each, totalling 10 instructions for all three of those steps.
Then for n = 1000 your algorithm would take:
Steps 1 + 2 + 4 = 10 instructions. Step 3 = 5*1000 = 5000 instructions.
This is a total of 5010 instructions. This is about 5*n instructions, which is a constant factor of n, which is why we say it is O(n).
For very large n, the 10 in f = 5*n + 10 becomes more and more insignificant, as does the 5. For this reason, we simply reduce the function to f is within a constant factor of n for large n by saying f is in O(n).
In this way it's easy to describe the idea that a quadratic function like f1 = n^2 + 2 is always larger than any linear function like f2 = 10000*n + 50000 when n is large enough, by simply writing f1 as O(n) and f2 as O(n^2).

You are correct. The running time is O(n) where n is the number of elements in the arrays. Each time you add 1 element to the arrays, you would have to execute the loop 1 more time in the worst case.

Related

Is the computational complexity of counting runs in cribbage O(N*log(N)) in the worst case?

In the card game cribbage, counting the runs for a hand during the show (one of the stages of a turn in the game) is reporting the longest increasing subsequence which consists of only values that increase by 1. If duplicate values exist are apart of this subsequence than a double run (or triple, quadruple, et cetera) is reported.
Some examples:
("A","2","3","4","5") => (1,5) Single run for 5
("A","2","3","4","4") => (2,4) Double run for 4
("A","2","3","3","3") => (3,3) Triple run for 3
("A","2","3","4","6") => (1,4) Single run for 4
("A","2","3","5","6") => (1,3) Single run for 3
("A","2","4","5","7") => (0,0) No runs
To address cases that arise with hands larger than the cribbage hand size of 5. A run will be selected if it has the maximum product of the number duplicates of a subsequence and that subsequences length.
Some relevant examples:
("A","2","2","3","5","6","7","8","9","T","J") => (1,7) Single run for 7
("A","2","2","3","5","6","7","8") => (2,3) Double run for 3
My method for finding the maximum scoring run is as follows:
Create a list of ranks and sort it. O(N*log(N))
Create a list to store the length of the maximum run length and how many duplicates of it exist. Initialize it to [1 duplicate, 1 long].
Create an identical list as above to store the current run.
Create a flag that indicates whether the duplicate you've encountered is not the initial duplicate of this value. Initialize it to False.
Create a variable to store the increase in duplicate subsequences if additional duplicates values are found after the initial duplicate. Initialize it to 1.
Iterate over the differences between adjacent elements. O(N)
If the difference is greater than one, the run has ended. Check if the product of the elements of the max run is less than the current run and the current run has length 3 or greater. If this is true, the current run becomes the maximum run and the current run list is reset to [1,1]. The flag is reset to False. The increment for duplicate subsequences is reset to 1. Iterate to next value.
If the difference is 1, increment the length of the current run by 1 and set the flag to False. Iterate to next value.
If the difference is 0 and the flag is False, set the increment for duplicate subsequences equal to the current number of duplicates for the run. Then, double the number of duplicates for the run and set the flag to True. Iterate to the next value
If the difference is 0 and the flag is True, increase the number of the runs by the increment for duplicate subsequences value.
After the iteration, check the current run list as in step 7 against the max run and set max run accordingly.
I believe this has O(N*(1+log(N)). I believe this is the best time complexity, but I am not sure how to prove this or what a better algorithm would look like. Is there a way to do this without sorting the list first that achieves a better time complexity? If not, how does one go about proving this is the best time complexity?
iterate over the differences between
Time complexity of an algorithm is a well-traveled path. Proving the complexity of an algorithm varies slightly among mathematician clusters; rather, the complexity community usually works with modular pseudo-code and standard reductions. For instance, a for loop based on the input length is O(N) (surprise); sorting a list is known to be O(log N) at best (in the general case). For an good treatment, see Big O, how do you calculate/approximate it?.
Note: O(N x (1+log(N)) is slightly sloppy notation. Only the greatest complexity factor -- the one that dominates as N approaches infinity -- is used. Drop the 1+: it's simply O(N log N).
As I suggested in a comment, you can simply count elements. Keep a list of counts, indexed by your card values. For discussing the algorithm, don't use the "dirty" data of character representations: "A23456789TJQK"; simply use their values, either 0-12 or 1-13.
for rank in hand:
count[rank] += 1
This is a linear pass through the data, O(N).
Now, traverse your array of counts, finding the longest sequence of non-zero values. This is a fixed-length list of 13 elements, touching each element only once: O(1). If you accumulate a list of multiples (card counts, then you'll also have your combinatoric factors at the end.
The resulting algorithm and code are, therefore, O(N).
For instance, let's shorten this to 7 card values, 0-6. Given the input integers
1 2 1 3 6 1 3 5 0
You make the first pass to count items:
[1 3 1 2 0 1 1]
A second pass gives you a max run length of 4, with counts [1 3 1 2].
You report a run of 4, a triple and a double, or the point count
4 * (1 * 3 * 1 * 2)
You can also count the pair values:
2 * 3! + 2 * 2!

What makes this prime factorization so efficient?

I've been doing some Project Euler problems to learn/practice Lua, and my initial quick-and-dirty way of finding the largest prime factor of nwas pretty bad, so I looked up some code to see how others were doing it (in attempts to understand different factoring methodologies).
I ran across the following (originally in Python - this is my Lua):
function Main()
local n = 102
local i = 2
while i^2 < n do
while n%i==0 do n = n / i end
i = i+1
end
print(n)
end
This factored huge numbers in a very short time - almost immediately. The thing I noticed about the algorithm that I wouldn't have divined:
n = n / i
This seems to be in all of the decent algorithms. I've worked it out on paper with smaller numbers and I can see that it makes the numbers converge, but I don't understand why this operation converges on the largest prime factor.
Can anyone explain?
In this case, i is the prime factor candidate. Consider, n is composed of the following prime numbers:
n = p1^n1 * p2^n2 * p3^n3
When i reaches p1, the statement n = n / i = n / p1 removes one occurrence of p1:
n / p1 = p1^(n-1) * p2^n2 * p3^n3
The inner while iterates as long as there are p1s in n. Thus, after the iteration is complete (when i = i + 1 is executed), all occurrences of p1 have been removed and:
n' = p2^n2 * p3^n3
Let's skip some iterations until i reaches p3. The remaining n is then:
n'' = p3^n3
Here, we find a first mistake in the code. If n3 is 2, then the outer condition does not hold and we remain with p3^2. It should be while i^2 <= n.
As before, the inner while removes all occurences of p3, leaving us with n'''=1. This is the second mistake. It should be while n%i==0 and n>i (not sure about the LUA syntax), which keeps the very last occurence.
So the above code works for all numbers n where the largest prime factor occurrs only once by successivley removing all other factors. For all other numbers, the mentioned corrections should make it work, too.
This eliminates all the known smaller prime factors off n so that n becomes smaller, and sqrt(n) can be reached earlier. This gives the performance boost, as you no longer need to run numbers to square root of original N, say if n is a million, it consists of 2's and 5's, and naive querying against all known primes would need to check against all primes up to 1000, while dividing this by 2 yields 15625, then dividing by 5 yields 1 (by the way, your algorithm will return 1! To fix, if your loop exits with n=1, return i instead.) effectively factoring the big number in two steps. But this is only acceptable with "common" numbers, that have a single high prime denominator and a bunch of smaller ones, but factoring a number n=p*q whiere both p and q are primes and are close won't be able to benefit from this boost.
The n=n/i line works because if you are seeking another prime than i you are currently found as a divisor, the result is also divisible by that prime, by definition of prime numbers. Read here: https://en.wikipedia.org/wiki/Fundamental_theorem_of_arithmetic . Also this only works in your case because your i runs from 2 upward, so that you first divide by primes then their composites. Otherwise, if your number would have a 3 as largest prime, is also divisible by 2 and you'd check against 6 first, you'd spoil the principle of only dividing by primes (say with 72, if you first divide by 6, you'll end up with 2, while the answer is 3) by accidentally dividing by a composite of a largest prime.
This algorithm (when corrected) takes O(max(p2,sqrt(p1))) steps to find the prime factorization of n, where p1 is the largest prime factor and the p2 is the second largest prime factor. In case of a repeated largest prime factor, p1=p2.
Knuth and Trabb Pardo studied the behavior of this function "Analysis of a Simple Factorization Algorithm" Theoretical Computer Science 3 (1976) 321-348. They argued against the usual analysis such as computing the average number of steps taken when factoring integers up to n. Although a few numbers with large prime factors boost the average value, in a cryptographic context what may be more relevant is that some of the percentiles are quite low. For example, 44.7% of numbers satisfy max(sqrt(p1),p2)<n^(1/3), and 1.2% of numbers satisfy max(sqrt(p1),p2)<n^(1/5).
A simple improvement is to test the remainder for primality after you find a new prime factor. It is very fast to test whether a number is prime. This reduces the time to O(p2) by avoiding the trial divisions between p2 and sqrt(p1). The median size of the second largest prime is about n^0.21. This means it is feasible to factor many 45-digit numbers rapidly (in a few processor-seconds) using this improvement on trial division. By comparison, Pollard-rho factorization on a product of two primes takes O(sqrt(p2)) steps on average, according to one model.

Time complexity Big Oh using Summations

Few questions about deriving expressions to find the runtime using summations.
The Big-Oh time complexity is already given, so using summations to find the complexity is what I am focused on.
So I know that there are 2 instructions that must be run before the first iteration of the loop, and 2 instructions that have to be run, (the comparison, and increment of i) after the first iteration. Of course, there is only 1 instruction within the for loop. So deriving I have 2n + 3, ridding of the 3 and the 2, I know the time complexity is O(n).
Here I know how to start writing the summation, but the increment in the for loop is still a little confusing for me.
Here is what I have:
So I know my summation time complexity derivation is wrong.
Any ideas as to where I'm going wrong?
Thank you
Just use n / 2 on the top and i = 1 on the bottom:
The reason it's i = 1 and not i = 0 is because the for loop's condition is i < n so you need to account for being one off since in the summation, i will increase all the way up to n / 2 and not 1 short.

Greedy Algorithm Optimization

I have the following problem:
Let there be n projects.
Let Fi(x) equal to the number of points you will obtain if you spent
x units of time working on project i.
You have T units of time to use and work on any project you would
like.
The goal is to maximize the number of points you will earn and the F functions are non-decreasing.
The F functions have diminishing marginal return, in other words spending x+1 unit of time working on a particular project will yield less of an increase in total points earned from that project than spending x unit of time on the project did.
I have come up with the following O(nlogn + Tlogn) algorithm but I am supposed to find an algorithm running in O(n + Tlogn):
sum = 0
schedule[]
gain[] = sort(fi(1))
for sum < T
getMax(gain) // assume that the max gain corresponds to project "P"
schedule[P]++
sum++
gain.sortedInsert(Fp(schedule[P] + 1) - gain[P])
gain[P].sortedDelete()
return schedule
That is, it takes O(nlogn) to sort the initial gain array and O(Tlogn) to run through the loop. I have thought through this problem more than I care to admit and cannot come up with an algorithm that would run in O(n + Tlogn).
For the first case, use a Heap, constructing the heap will take O(n) time, and each ExtractMin & DecreaseKey function call will take O(logN) time.
For the second case construct a nXT table where ith column denotes the solution for the case T=i. i+1 th column should only depend on the values on the ith column and the function F, hence calculatable in O(nT) time. I did not think all the cases thoroughly but this should give you a good start.

From the given array of numbers find all the of numbers in group of 3 with sum value N

Given is a array of numbers:
1, 2, 8, 6, 9, 0, 4
We need to find all the numbers in group of three which sums to a value N ( say 11 in this example). Here, the possible numbers in group of three are:
{1,2,8}, {1,4,6}, {0,2,9}
The first solution I could think was of O(n^3). Later I could improve a little(n^2 log n) with the approach:
1. Sort the array.
2. Select any two number and perform binary search for the third element.
Can it be improved further with some other approaches?
You can certainly do it in O(n^2): for each i in the array, test whether two other values sum to N-i.
You can test in O(n) whether two values in a sorted array sum to k by sweeping from both ends at once. If the sum of the two elements you're on is too big, decrement the "right-to-left" index to make it smaller. If the sum is too small, increment the "left-to-right" index to make it bigger. If there's a pair that works, you'll find them, and you perform at most 2*n iterations before you run out of road at one end or the other. You might need code to ignore the value you're using as i, depends what the rules are.
You could instead use some kind of dynamic programming, working down from N, and you probably end up with time something like O(n*N) or so. Realistically I don't think that's any better: it looks like all your numbers are non-negative, so if n is much bigger than N then before you start you can quickly throw out any large values from the array, and also any duplicates beyond 3 copies of each value (or 2 copies, as long as you check whether 3*i == N before discarding the 3rd copy of i). After that step, n is O(N).

Resources