Dynamic programming in power sets - algorithm

Is it possible to use dynamic programming in the calculation of power set of a string (ie, all possible subsequences of that string) to reduce the number of computations significantly?

No. If you are calculating a powerset, you are calculating a powerset, which always has the same number of elements.

You can never reduce complexity below linear with the size of the output, because you need go through each of the output bits some way or another. This is true for all problems, regardless of algorithm used. So 2^n is the lower bound for computation of the power set, because you need to output 2^n strings (and every string is multiple characters, which depends on n on average, so even higher).

Related

Do problem constraints change the time complexity of algorithms?

Let's say that the algorithm involves iterating through a string character by character.
If I know for sure that the length of the string is less than, say, 15 characters, will the time complexity be O(1) or will it remain as O(n)?
There are two aspects to this question - the core of the question is, can problem constraints change the asymptotic complexity of an algorithm? The answer to that is yes. But then you give an example of a constraint (strings limited to 15 characters) where the answer is: the question doesn't make sense. A lot of the other answers here are misleading because they address only the second aspect but try to reach a conclusion about the first one.
Formally, the asymptotic complexity of an algorithm is measured by considering a set of inputs where the input sizes (i.e. what we call n) are unbounded. The reason n must be unbounded is because the definition of asymptotic complexity is a statement like "there is some n0 such that for all n ≥ n0, ...", so if the set doesn't contain any inputs of size n ≥ n0 then this statement is vacuous.
Since algorithms can have different running times depending on which inputs of each size we consider, we often distinguish between "average", "worst case" and "best case" time complexity. Take for example insertion sort:
In the average case, insertion sort has to compare the current element with half of the elements in the sorted portion of the array, so the algorithm does about n2/4 comparisons.
In the worst case, when the array is in descending order, insertion sort has to compare the current element with every element in the sorted portion (because it's less than all of them), so the algorithm does about n2/2 comparisons.
In the best case, when the array is in ascending order, insertion sort only has to compare the current element with the largest element in the sorted portion, so the algorithm does about n comparisons.
However, now suppose we add the constraint that the input array is always in ascending order except for its smallest element:
Now the average case does about 3n/2 comparisons,
The worst case does about 2n comparisons,
And the best case does about n comparisons.
Note that it's the same algorithm, insertion sort, but because we're considering a different set of inputs where the algorithm has different performance characteristics, we end up with a different time complexity for the average case because we're taking an average over a different set, and similarly we get a different time complexity for the worst case because we're choosing the worst inputs from a different set. Hence, yes, adding a problem constraint can change the time complexity even if the algorithm itself is not changed.
However, now let's consider your example of an algorithm which iterates over each character in a string, with the added constraint that the string's length is at most 15 characters. Here, it does not make sense to talk about the asymptotic complexity, because the input sizes n in your set are not unbounded. This particular set of inputs is not valid for doing such an analysis with.
In the mathematical sense, yes. Big-O notation describes the behavior of an algorithm in the limit, and if you have a fixed upper bound on the input size, that implies it has a maximum constant complexity.
That said, context is important. All computers have a realistic limit to the amount of input they can accept (a technical upper bound). Just because nothing in the world can store a yottabyte of data doesn't mean saying every algorithm is O(1) is useful! It's about applying the mathematics in a way that makes sense for the situation.
Here are two contexts for your example, one where it makes sense to call it O(1), and one where it does not.
"I decided I won't put strings of length more than 15 into my program, therefore it is O(1)". This is not a super useful interpretation of the runtime. The actual time is still strongly tied to the size of the string; a string of size 1 will run much faster than one of size 15 even if there is technically a constant bound. In other words, within the constraints of your problem there is still a strong correlation to n.
"My algorithm will process a list of n strings, each with maximum size 15". Here we have a different story; the runtime is dominated by having to run through the list! There's a point where n is so large that the time to process a single string doesn't change the correlation. Now it makes sense to consider the time to process a single string O(1), and therefore the time to process the whole list O(n)
That said, Big-O notation doesn't have to only use one variable! There are problems where upper bounds are intrinsic to the algorithm, but you wouldn't put a bound on the input arbitrarily. Instead, you can describe each dimension of your input as a different variable:
n = list length
s = maximum string length
=> O(n*s)
It depends.
If your algorithm's requirements would grow if larger inputs were provided, then the algorithmic complexity can (and should) be evaluated independently of the inputs. So iterating over all the elements of a list, array, string, etc., is O(n) in relation to the length of the input.
If your algorithm is tied to the limited input size, then that fact becomes part of your algorithmic complexity. For example, maybe your algorithm only iterates over the first 15 characters of the input string, regardless of how long it is. Or maybe your business case simply indicates that a larger input would be an indication of a bug in the calling code, so you opt to immediately exit with an error whenever the input size is larger than a fixed number. In those cases, the algorithm will have constant requirements as the input length tends toward very large numbers.
From Wikipedia
Big O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity.
...
In computer science, big O notation is used to classify algorithms according to how their run time or space requirements grow as the input size grows.
In practice, almost all inputs have limits: you cannot input a number larger than what's representable by the numeric type, or a string that's larger than the available memory space. So it would be silly to say that any limits change an algorithm's asymptotic complexity. You could, in theory, use 15 as your asymptote (or "particular value"), and therefore use Big-O notation to define how an algorithm grows as the input approaches that size. There are some algorithms with such terrible complexity (or some execution environments with limited-enough resources) that this would be meaningful.
But if your argument (string length) does not tend toward a large enough value for some aspect of your algorithm's complexity to define the growth of its resource requirements, it's arguably not appropriate to use asymptotic notation at all.
NO!
The time complexity of an algorithm is independent of program constraints. Here is (a simple) way of thinking about it:
Say your algorithm iterates over the string and appends all consonants to a list.
Now, for iteration time complexity is O(n). This means that the time taken will increase roughly in proportion to the increase in the length of the string. (Time itself though would vary depending on the time taken by the if statement and Branch Prediction)
The fact that you know that the string is between 1 and 15 characters long will not change how the program runs, it merely tells you what to expect.
For example, knowing that your values are going to be less than 65000 you could store them in a 16-bit integer and not worry about Integer overflow.
Do problem constraints change the time complexity of algorithms?
No.
If I know for sure that the length of the string is less than, say, 15 characters ..."
We already know the length of the string is less than SIZE_MAX. Knowing an upper fixed bound for string length does not make the the time complexity O(1).
Time complexity remains O(n).
Big-O measures the complexity of algorithms, not of code. It means Big-O does not know the physical limitations of computers. A Big-O measure today will be the same in 1 million years when computers, and programmers alike, have evolved beyond recognition.
So restrictions imposed by today's computers are irrelevant for Big-O. Even though any loop is finite in code, that need not be the case in algorithmic terms. The loop may be finite or infinite. It is up to the programmer/Big-O analyst to decide. Only s/he knows which algorithm the code intends to implement. If the number of loop iterations is finite, the loop has a Big-O complexity of O(1) because there is no asymptotic growth with N. If, on the other hand, the number of loop iterations is infinite, the Big-O complexity is O(N) because there is an asymptotic growth with N.
The above is straight from the definition of Big-O complexity. There are no ifs or buts. The way the OP describes the loop makes it O(1).
A fundamental requirement of big-O notation is that parameters do not have an upper limit. Suppose performing an operation on N elements takes a time precisely equal to 3E24*N*N*N / (1E24+N*N*N) microseconds. For small values of N, the execution time would be proportional to N^3, but as N gets larger the N^3 term in the denominator would start to play an increasing role in the computation.
If N is 1, the time would be 3 microseconds.
If N is 1E3, the time would be about 3E33/1E24, i.e. 3.0E9.
If N is 1E6, the time would be about 3E42/1E24, i.e. 3.0E18
If N is 1E7, the time would be 3E45/1.001E24, i.e. ~2.997E21
If N is 1E8, the time would be about 3E48/2E24, i.e. 1.5E24
If N is 1E9, the time would be 3E51/1.001E27, i.e. ~2.997E24
If N is 1E10, the time would be about 3E54/1.000001E30, i.e. 2.999997E24
As N gets bigger, the time would continue to grow, but no matter how big N gets the time would always be less than 3.000E24 seconds. Thus, the time required for this algorithm would be O(1) because one could specify a constant k such that the time necessary to perform the computation with size N would be less than k.
For any practical value of N, the time required would be proportional to N^3, but from an O(N) standpoint the worst-case time requirement is constant. The fact that the time changes rapidly in response to small values of N is irrelevant to the "big picture" behaviour, which is what big-O notation measures.
It will be O(1) i.e. constant.
This is because for calculating time complexity or worst-case time complexity (to be precise), we think of the input as a huge chunk of data and the length of this data is assumed to be n.
Let us say, we do some maximum work C on each part of this input data, which we will consider as a constant.
In order to get the worst-case time complexity, we need to loop through each part of the input data i.e. we need to loop n times.
So, the time complexity will be:
n x C.
Since you fixed n to be less than 15 characters, n can also be assumed as a constant number.
Hence in this case:
n = constant and,
(maximum constant work done) = C = constant
So time complexity is n x C = constant x constant = constant i.e. O(1)
Edit
The reason why I have said n = constant and C = constant for this case, is because the time difference for doing calculations for smaller n will become so insignificant (compared to n being a very large number) for modern computers that we can assume it to be constant.
Otherwise, every function ever build will take some time, and we can't say things like:
lookup time is constant for hashmaps

Big O notation for inverse exponential algorithm

Let's say you had an algorithm which had n^(-1/2) complexity, say a scientific algorithm where one sample doesn't give much information so it takes ages to process it, but many samples to cross-reference made it faster. Would you represent that as O(n^(-1/2))? Is that even possible theoretically? Tldr can you have an inverse exponential time complexity?
You could define O(n^(-0.5)) using this set:
O(n^(-0.5)) := {g(n) : There exist positive constants c and N such that 0<=g(n)<=cn^(-0.5), for n > N}.
The function n^(-1), for example, belongs to this set.
None of the elements of the set above, however, could be a an upper bound on the running time of an algorithm.
Note that for any constant c:
if: n>c^2 then: n^(-0.5)*c < 1.
This means that your algorithm do less than one simple operation for input large enough. Since it must execute a natural number of simple operation, we have that it does exactly 0 operations - nothing at all.
A decreasing running time doesn't make sense in practice (even less if it decreases to zero). If that existed, you would find ways to add dummy elements and increase N artificially.
But most algorithm have at least O(N) complexity (whenever every data element influences the final solution); even if not, just the representation of N gets longer and longer which will eventually increase the running time (like O(Log N)).

Can an integer which must hold the value of n contribute to space complexity?

If an algorithm requires an integer which can contain the number n (e.g. counting the size of an input array), that integer must take on the order of log(n) space (right?).
If this is the only space which scales with n, is the space complexity of the algorithm O(logn)?
Formally, it depends on the model of computation you're using. In the classical random access machine model, with a fixed word size, simply storing the length n of the input indeed requires O(log n) space (and simple arithmetic on such numbers takes O(log n) time). On the other hand, in the transdichotomous model, where the word size is assumed to grow logarithmically with the input size n, it only requires O(1) space (and time). Other models may yield yet other answers.
In practice, most analysis of algorithms assumes that simple arithmetic on moderate-size integers (i.e. proportional to the input length) can be done in constant time and space. A practical reason for this, besides the fact that it simplifies analysis, is that in practice the logarithm of the input length cannot grow very large — even a computer capable of counting up from zero to, say, 2256, much less of reading that many bits of input, is probably forever beyond the means of humankind to build using known physics. Thus, for any conceivable realistic inputs, you can simply assume that a machine with a 256-bit word size can store the length of the input in a single word (and that machines with a smaller word size still need only a small constant number of words).
Here n is bounded i.e. n will be 32 bit signed integer as array size has certain limitation. So log(32) is bounded and its O(1)

Does multiplication take unit time?

I have the following problem
Under what circumstances can multiplication be regarded as a unit time
operation?
But I thought multiplication is always considered to be taking unit time. Was I wrong?
It depends on what N is. If N is the number of bits in an arbitrarily large number, then as the number of bits increases, it takes longer to compute the product. However, in most programming languages and applications, the size of numbers are capped to some reasonable number of bits (usually 32 or 64). In hardware, these numbers are multiplied in one step that does not depend on the size of the number.
When the number of bits is a fixed number, like 32, then it doesn't make sense to talk about asymptotic complexity, and you can treat multiplication like an O(1) operation in terms of whatever algorithm you're looking at. When can become arbitrarily large, like with Java's BigInteger class, then multiplication depends on the size of those numbers, as does the memory required to store them.
Only in those cases where you're performing operations on two numbers,of numeric type (emphasis here ,not going into the binary detail), you simply need to assume that the operation being performed is of constant time only.
It's not defined as unit time,but,more strictly, a constant time interval which doesn't change even if we increase the size of number,but, in reality the calculation does utilise subtle more time to perform calculation on large numbers). These are generally considered trivial,unless the numbers being multiplied are too large,like BigIntegers in Java,etc.
But,as soon as we move towards performing multiplication of binary strings, our complexity increases and naive method yields a complexity of O(n^2).
So,to aimplify, we perform a divide and conquer-based multiplication, also known as Karatsuba's algorithm for multiplication, which has a complexity of O(n^1.59) which reduces the total number of multiplications and additions to some lesser number of multiplications and some of the additions.
I hope I haven't misjudged the question. If so,please alert me so that
I can remove this answer. If I understood the question properly, then the other answer posted here seems
incomplete.
The expression unit time is a little ambiguous (and AFAIK not much used).
True unit time is achieved when the multiply is performed in a single clock cycle. This rarely occurs on modern processors.
If the execution time of the multiply does not depend on the particular values of the operands, we can say that it is performed in constant time.
When the operand length is bounded, so that the time never exceeds a given duration, we also say that an operation is performed in constant time.
This constant duration can be used as the timing unit of the running time, so that you count in "multiplies" instead of seconds (ops, flops).
Lastly, you can evaluate the performance of an algorithm in terms of the number of multiplies it performs, independently of the time they take.

What is complexity measured against? (bits, number of elements, ...)

I've read that the naive approach to testing primality has exponential complexity because you judge the algorithm by the size of its input. Mysteriously, people insist that when discussing primality of an integer, the appropriate measure of the size of the input is the number of bits (not n, the integer itself).
However, when discussing an algorithm like Floyd's, the complexity is often stated in terms of the number of nodes without regard to the number of bits required to store those nodes.
I'm not trying to make an argument here. I honestly don't understand the reasoning. Please explain. Thanks.
Traditionally speaking, the complexity is measured against the size of input.
In case of numbers, the size of input is log of this number (because it is a binary representation of it), in case of graphs, all edges and vertices must be represented somehow in the input, so the size of the input is linear in |V| and |E|.
For example, naive primality test that runs in linear time of the number itself, is called pseudo-polynomial. It is polynomial in the number, but it is NOT polynomial in the size of the input, which is log(n), and it is in fact exponential in the size of the input.
As a side note, it does not matter if you use the size of the input in bits, bytes, or any other CONSTANT factor for this matter, because it will be discarded anyway later on when computing the asymptotical notation as constants.
The main difference is that when discussing algorithms we keep in the back of our mind a hardware that is able to perform operations on the data used in O(1) time. When being strict or when considering data which is not able to fit into the processors register then taking the number of bits in account becomes important.
Although the size of input is measured in the number of bits, in many cases we can use a shortcut that lets us divide out a constant number of bits. This constant factor is embedded in the representation that we choose for our data structure.
When discussing graph algorithms, we assume that each vertex and each edge has a fixed cost of representation in terms of the number of bits, which does not depend of the number of vertices and edges. This assumption requires that weights associated with vertices and edges have fixed size in terms of the number of bits (i.e. all integers, all floats, etc.)
With this assumption in place, adjacency list representation has fixed size per edge or vertex, because we need one pointer per edge and one pointer per vertex, in addition to the weights, which we presume to be of constant size as well.
Same goes for adjacency matrix representation, because we need W(E2 + V) bits for the matrix, where W is the number of bits required to store the weight.
In rare situations when weights themselves are dependent on the number of vertices or edges the assumption of fixed weight no longer holds, so we must go back to counting the number of bits.

Resources