Naive string search algorithm - average time

Naive string search algorithm - average time - algorithm

I'm studding Naive string search algorithm (aka brute force algorithm). I know that, there exists other more efficient algorithms, but as I'm starting from basic, currently I'm interested only in this algorithm.
And I have a question, as follows:
What is the average time complexity (ϴ) for this algorithm?
I have found that best and worst cases have respectively ϴ = N, ϴ = M*N

From your comment to the question, it seems that the N text characters are uniformly randomly generated. For this setting, brute force's average time is O(N - M), irrespective of the way the search string is generated. (Note that Wikipedia states O(N + M), but we can actually deduce O(N - M) using the following analysis. See also these lecture notes).
Consider the iteration where the search string is matched against the text at position i of the text. For any search string, each character of the search string has a probability p = 255/256 of not matching the character of the search string. Say we define that a "success" is not matching. Then the number of attempts until success is a Geometric Distribution with expected (1 - p) / p = O(1) failures until success.
So, for position i, the expected cost is O(1). By linearity of expectation, we need now to sum over all relevant i. There are Θ(N - M) such i.

Related

Would this n x n transpose algorithm be considered an in place algorithm?

Based on my research, I am gaining conflicting information about this simple algorithm. This algorithm is a basic matrix transposition, that transposes an n x n matrix A.
My current understanding is this algorithm would run at O(n^2) time and have a space complexity of O(1) as the matrix we manipulate would be the same one we deal with.
But- I have also been told it would actually run O(n) time and have space complexity of O(n) as well. Which means it wouldn't be in-place, as it requires extra space for manipulation.
Which thought process is correct here for the transpose algo below?
Transpose(A)
1. for i = 1 to n -1
2. for j = i + 1 to n
3. exchange A[i, j] with A[j,i]

Some confusion might arise from the facts that the workload is proportional to the number of elements in the array, and these elements occupy their own space. So by some notation abuse or inattention, both would be said "O(n)".
But this is wrong because
n is clearly not the number of elements but the number of rows/columns of the array;
by definition the space complexity does not include the input and output data, but any auxiliary space that would be required.
Hence we can confirm the time complexity O(n²) - in fact Θ(n²) - and space complexity O(1). The algorithm is in-place.
Final remark:
If we denote the number of elements as m, the time complexity is O(m), and there is no contradiction.

Checking if an integer is an integer power of another?

This is identical to the question found on Check if one integer is an integer power of another, but I am wondering about the complexity of a method that I came up with to solve this problem.
Given an integer n and another integer m, is n = m^p for some integer p. Note that ^ here is exponentiation and not xor.
There is a simple O(log_m n) solution based on dividing n repeatedly by m until it's 1 or until there's a non-zero remainder.
I'm thinking of a method inspired by binary search, and it's not clear to me how complexity should be calculated in this case.
Essentially you start with m, then you go to m^2, then m^4, m^8, m^16, .....
When you find that m^{2^k} > n, you check the range bounded by m^{2^{k-1}} and m^{2^k}. Is this solution O(log_2 (log_m(n)))?
Somewhat related to this, if I do something like
m^2 * m^2
vs.
m * m * m * m
Do these 2 have the same complexity? If they do, then I think the algorithm I came up with is still O(log_m (n))

Not quite. First of all, let's assume that multiplication is O(1), and exponentiation a^b is O(log b) (using exponentiation by squaring).
Now using your method of doubling the exponent p_candidate and then doing a binary search, you can find the real p in log(p) steps (or observe that p does not exist). But each try within the binary search requires you to compute m^p_candidate, which is bounded by m^p, which by assumption is O(log(p)). So the overall time complexity is O(log^2(p)).
But we want to express the time complexity in terms of the inputs n and m. From the relationship n = m^p, we get p = log(n)/log(m), and hence log(p) = log(log(n)/log(m)). Hence the overall time complexity is
O(log^2(log(n)/log(m)))
If you want to get rid of the m, you can provide a looser upper bound by using
O(log^2(log(n)))
which is close to, but not quite O(log(log(n))).
(Note that you can always omit the logarithmic bases in the O-notation since all logarithmic functions differ only by a constant factor.)
Now, the interesting question is: is this algorithm better than one that is O(log(n))? I haven't proved it, but I'm pretty certain it is the case that O(log^2(log(n))) is in O(log(n)) but not vice versa. Anyone cares to prove it?

Time complexity for modified binary search which calculates the mid as high - 2

I have to find out the time complexity for binary search which calculates the dividing point as mid = high - 2 (instead of mid = (low + high)/2)
so as to know how much slower or faster the modified algorithm would be

The worst-case scenario is that the searched item is the very first one. In this case, since you always subtract 2 from n, you will have roughly n/2 steps, which is a linear complexity. The best case is that the searched item is exactly at n-2, which will take a constant complexity. The average complexity, assuming that n -> infinity will be linear as well.

Hint: You can derive the answer based on the recurrence formula for binary search.
We have T(n) = T(floor(n/2)) + O(1)
Since we divide in two equal halfs, we have floor(n/2). You should rewrite the given formula to describe the modified version. Furthermore, you should use Akra-Bazzi method to solve the recursion formula for the modified version since you are dividing in two unbalanced halfs.

Time complexity for n-ary search.

I am studying time complexity for binary search, ternary search and k-ary search in N elements and have come up with its respective asymptotic worse case run- time. However, I started to wonder what would happen if I divide N elements into n ranges (or aka n-ary search in n elements). Would that be a sorted linear search in an array which would result in a run-time of O(N)? This is a little confusing. Please help me out!

What you say is right.
For a k-ary search we have:
Do k-1 checks in boundaries to isolate one of the k ranges.
Jump into the range obtained from above.
Hence the time complexity is essentially O((k-1)*log_k(N)) where log_k(N) means 'log(N) to base k'. This has a minimum when k=2.
If k = N, the time complexity will be: O((N-1) * log_N(N)) = O(N-1) = O(N), which is the same algorithmically and complexity-wise as linear search.
Translated to the algorithm above, it is:
Do N-1 checks in boundaries (each of the first N-1 elements) to isolate one of the N ranges. This is the same as a linear search in the first N-1 elements.
Jump into the range obtained from above. This is the same as checking the last element (in constant time).

Precise Input Size and Time Complexity

When talking about time complexity we usually use n as input, which is not a precise measure of the actual input size. I am having trouble showing that, when using specific size for input (s) an algorithm remains in the same complexity class.
For instance, take a simple Sequential Search algorithm. In its worst case it takes W(n) time. If we apply specific input size (in base 2), the order should be W(lg L), where L is the largest integer.
How do I show that Sequential Search, or any algorithm, remains the same complexity class, in this case linear time? I understand that there is some sort of substitution that needs to take place, but I am shaky on how to come to the conclusion.
EDIT
I think I may have found what I was looking for, but I'm not entirely sure.
If you define worst case time complexity as W(s), the maximum number of steps done by an algorithm for an input size of s, then by definition of input size, s = lg n, where n is the input. Then, n = 2^s, leading to the conclusion that the time complexity is W(2^s), an exponential complexity. Therefore, the algorithm's performance with binary encoding is exponential, not linear as it is in terms of magnitude.

When talking about time complexity we usually use n as input, which is not a precise measure of the actual input size. I am having trouble showing that, when using specific size for input (s) an algorithm remains in the same complexity class.
For instance, take a simple Sequential Search algorithm. In its worst case it takes W(n) time. If we apply specific input size (in base 2), the order should be W(lg L), where L is the largest integer.
L is a variable that represents the largest integer.
n is a variable that represents the size of the input.
L is not a specific value anymore than n is.
When you apply a specific value, you aren't talking about a complexity class anymore, you are talking about an instance of that class.
Let's say you are searching a list of 500 integers. In other words, n = 500
The worst-case complexity class of Sequential Search is O(n)
The complexity is n
The specific instance of worst-case complexity is 500
Edit:
Your values will be uniform in the number of bits required to encode each value. If the input is a list of 32bit integers, then c = 32, the number of bits per integer. Complexity would be 32*n => O(n).
In terms of L, if L is the largest value, and lg L is the number of bits required to encode L, then lg L is the constant c. Your complexity in terms of bits is O(n) = c*n, where c = lg L is the constant specific input size.

What I know is that the maximum number
of steps done by Sequential Search is,
obviously, cn^2 + nlg L. cn^2 being
the number of steps to increment loops
and do branching.
That's not true at all. The maximum number of steps done by a sequential search is going to be c*n, where n is the number of items in the list and c is some constant. That's the worst case. There is no n^2 component or logarithmic component.
For example, a simple sequential search would be:
for (int i = 0; i < NumItems; ++i)
{
if (Items[i] == query)
return i;
}
return -1;
With that algorithm, if you search for each item, then half of the searches will require fewer than NumItems/2 iterations and half of the searches will require NumItems/2 or more iterations. If an item you search for isn't in the list, it will require NumItems iterations to determine that. The worst case running time is NumItems iterations. The average case is NumItems/2 iterations.
The actual number of operations performed is some constant, C, multiplied by the number of iterations. On average it's C*NumItems/2.

As Lucia Moura states: "Except for the unary encoding, all the other encodings for natural
numbers have lengths that are polynomially related"
Here is the source. Take a look at page 19.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Naive string search algorithm - average time - algorithm

Related

Would this n x n transpose algorithm be considered an in place algorithm?

Checking if an integer is an integer power of another?

Time complexity for modified binary search which calculates the mid as high - 2

Time complexity for n-ary search.

Precise Input Size and Time Complexity

Categories

Resources