What's the meaning of "input size" for 3-SAT? - algorithm

In the computational complexity theory, we say that an algorithm have complexity O(f(n)) if the number of computations that solve a problem with input size n is bounded by cf(n), for all integer n, where c is a positive constant non-depending on n, and f(n) is an increasing function that goes to infinity as n does.
The 3-SAT problem is stated as: given a CNF expression, whose clauses has exactly 3 literals, is there some assignment of TRUE and FALSE values to the variables that will make the entire expression true?
A CNF expression consists of, say, k clauses involving m variables x1, ..., xm.
In order to decide if 3-SAT has polynomial complexity P(n), or not, I need to understand something so simple as "what is n" in the problem.
My question is:
Which is considered, in this particular 3-SAT problem, the input
size n?
Is it the number k of clauses? Or is it the number m of variables?
Or n is some function of k and m? ( n=f(k,m) ).
I am in trouble with this simple issue.
According to the answer of Timmie Smith, we can consider the estimate:
k <= constant * f(m)
where m is a polynomial function of m.
More precisely, the function P(m) it could be considered of exponent 3 (that is, cubic).
Thus, if we consider the complexity f(k) of 3-SAT, we would have:
f(k, m)=f(P(m),m), (with P(m) = m^3).
So, if the function f is polyonomial in k and m, then actually results polynomial in m. Thus, by considering m as the input size, it would be to estimate if a given algorithm is, or not, polynomial in m, in order to know if 3-SAT is in P or not.
If you agree, I can accept the answer of Timmie as the good one.
UPDATE:
I did the same question here:
https://cstheory.stackexchange.com/questions/18756/whats-the-meaning-of-input-size-for-3-sat
The accepted answer was helpful to me.

The input is the number m of variables. This is because the number of possible clauses that can be formed given m variables is a polynomial function of the number of variables.

I've seen in some papers of Carnegie Mellon University the following definition of "size of input" for this kind of problems:
number of bits it takes to write the input down
Considering that the input can be compressed, this definition makes sense to me, because it is a good measure of input entropy.
My 2 cents! Cheers!!

The input size is the number of variables m.
The reason for this is that the size of the search space that needs to be traversed for solving the problems is determined entirely by the number of variables: Each variable has two possible states (1 or 0), the search space consists of all possible assignments. A brute-force algorithm would just test all possible assignments (2^m) to traverse the search space. Although most 3-SAT algorithms will be impacted significantly by the number of clauses, that does not influence the underlying problem's complexity.
Therefore the input size is also the number of variables for plain-old SAT, where the search space looks the same, although resolving clauses in a non-brute-force way works quite differently.

Related

How did the problem instance with the parameter size m = logn become 2^(logn)?

Problem Instance and Big-O Notation
Hello,
I am trying to understand a solution to a problem, but I am not understanding a part of the solution as to how the Problem instance was calculated.
Here are the questions and the solutions:
"An instance of a 'Perfect' decision problem is an integer n >= 1 and the problem is to decide if n is the sum of each of its proper divisors. For example, 6 is 'Perfect' because 1+2+3 = 6."
Since the input to 'Perfect' is a single integer n, an appropriate size parameter is m = log(n), since this is roughly the number of bits needed to represent n.
"Suppose an algorithm for deciding 'Perfect' requires O(n^2) steps, where n is the problem instance. Use the answer above and Big-O growth terminology to describe the growth of the algorithm's running time"
The algorithm has exponential running time since n^2 = (2^(logn))^2 = 4^(logn)
I can't seem to understand or figure out how the problem instance with the parameter size m = logn become 2^(logn)....
We are talking about m bits. m is equal to log(n), according to the statement.
Now every bit can be either 0 or 1.
Suppose you have to a representation of 2 bits : _ _. Each one of them can either be 0 or 1. And possible representations become 2^2=4.
Similaryly, the possible values for the above case become 2^m or 2^log(n).
As stated by #AbhinavMathur above, the text is clearly wrong. The time to solve the problem is exponential in the number of bits.

Why a knapsack problem is not solvable in a polynomial time with an algorithm using dynamic programming?

I saw this explanation but I still could not fully understand it.
If we follow this logic: Suppose some algorithm works in O(n) time, then:
Let's assume n = 1000 in binary term (4-bit long)
so the time complexity T(n) = O(8)
Let’s double the size of input. n = 10000000 (8-bit long) T(n) = O(128)
The time increases in exponential term, so it means that O(n) is not a polynomial time?
The question is: "Polynomial as a function of what?". When we determine complexity of algorithms, we usually, but not always, express it as a function of the length of the input. Often, but not always, this length is represented by the letter n. For instance, if you study problems involving graphs, the letter n is often used to denote the number of vertices in the graph, rather than for the length of the input.
In your case, the variable n is the number of items, and variable W is the capacity of the bag.
Thus, the number n is relevant to the complexity; but it is not, in itself, the entire length of the input. Let's call N the real length of the input. Try not to confuse n and N. The complexity of your algorithms will have to be expressed as functions of N, and terms like "linear complexity", "polynomial complexity", "exponential complexity", etc., will always be in reference to N, not n.
What is the length N of the input?
Your input consists in a list of pairs (weight(i), value(i)), followed by the capacity W. Presumably, each item in the bag is allowed to have a weight ranging from 0 to W; and a value ranging from 0 to some max value V. Thus, weights need as many bits as W to be expressed, and values need as many bits as V to be expressed.
If you know a little about logarithms and about writing numbers, you should know that the number of bits required to write a number is proportional to its logarithm. Thus, every weight requires log(W) bits, and every value requires log(V) bits.
Thus: N = n * (log(W) + log(V)).
You can convince yourself that the lengths of W and V are relevant to the complexity. Here all our numbers are integers. But imagine a real world problem. A weight can be expressed in tons, or in grams. 1 ton is the same as 1000000 grams. A value can be expressed in cents, or in tens of thousands of euros. Before inputting our real-world problem into our algorithm, we need to choose our units.
Is your algorithm going to take a longer time to solve the problem, if the weight is expressed in grams rather than in tons?
The answer is yes. Because grams are a million times more precise than tons. So you are asking for a million times more precise answer, when asking the question in grams rather than in tons. Thus the algorithm will take more time to find that solution.
I hope I could convince you that the complexity of the algorithm should be expressed as a function of the actual length of the input, and not just the number of elements.

difficult asymptotic(recurrence) function (algorithm analysis)

i am stuck on this one and i don't know how to solve it, no matter what i try i just can't find a way to play with the function so i can represent it in a way that will allow me to find a g(n), so that g(n) is T(n)∈Θ(g(n))
the function i am having trouble with is:
$T(n)=4n^4T(\sqrt n) +(n^6lgn+3lg^7n)(2n^2lgn+lg^3n)$
additionally, if you can - could you please check if i am on the right path with:
$T(n)=T(n-1)+\frac{1}{n}+\frac{1}{n^2}$
to solve it i tried to use: $T(n)-T(n-1)=\frac{1}{n}+\frac{1}{n^2}$ iff $(T(n)-T(n-1))+(T(n-1)-T(n-2))+\ldots+(T(2)-T(1))=\frac{1}{n}+\frac{1}{n-1}+...+\frac{1}{n^2}+\frac{1}{\left(n-1\right)^2}+....$ iff $(T(n)-T(n-1))+(T(n-1)-T(n-2))+\ldots+(T(2)-T(1))=T(n)=T(1)+\sum_{k=2}^n\frac{1}{n}+\sum_{k=2}^n\frac{1}{n^2}$ and then using the harmonic series formula. however i don't know how to continue and finish it and find the asymptotic boundaries to solve it
i hope that on the second i am on the right path. however i don't know how to solve the first one at all. if i've done any mistakes, please show me the right way so i can improve my mistakes.
thank you very much for your help
sorry that for some reason math doesn't show correctly here
Following on from the comments:
Solving (2) first since it is more straightforward.
Your expansion attempt is correct. Writing it slightly differently:
A, harmonic series - asymptotically equal to the natural logarithm:
γ = 0.57721... is the Euler-Mascheroni constant.
B, sum of inverse squares - the infinite sum is the famous Basel problem:
Which is 1.6449.... Therefore since B is monotonically increasing, it will always be O(1).
The total complexity of (2) is simply Θ(log n).
(1) is a little more tedious.
Little-o notation: strictly lower complexity class, i.e.:
Assume a set of N functions {F_i} is ordered in decreasing order of complexity, i.e. F2 = o(F1) etc. Take a linear combination of them:
Thus a sum of different functions is asymptotically equal to the one with the highest growth rate.
To sort the terms in the expansion of the two parentheses, note that
Provable by applying L'Hopital's rule. So the only asymptotically significant term is n^6 log n * 2n^2 log n = 2n^8 log^2 n.
Expand the summation as before, note that i) the factor 4n^4 accumulates, ii) the parameter for the m-th expansion is n^(1/(2^m)) (repeated square-root).
The new term added by the m-th expansion is therefore (will assume you know how to do this since you were able to do the same for (2)):
Rather surprisingly, each added term is precisely equal to the first.
Assume that the stopping condition for recursive expansion is n < 2 (which of course rounds down to T(1)):
Since each added term t_m is always the same, simply multiply by the maximum number of expansions:
Function (1) is

Big-O What are the constants k and n0 in the formal definition of the order of an algorithm?

In my textbook I see the following:
Definition of the order of an algorithm
Algorithm A is order f(n) -- denoted O(f(n)) -- if constants k and n0 exist such that A requires no more than k * f(n) time units to solve a problem of size n >= n0.
I understand: Time requirements for different complexity classes grow at different rates. For instance, with increasing values of n, the time required for O(n) grows much more slowly than O(n2), which grows more slowly than O(n3), and so forth.
I do not understand: How k and n0 fit into this definition.
What is n0? Specifically, why does n have subscript 0, what does this subscript mean?
With question 1 answered, what does a 'a problem of size n >= n0' mean? A larger data set? More loop repetitions? A growing problem size?
What is k then? Why is k being multiplied by f(n)? What does k have to do with increasing the problem size - n?
I've already looked at:
Big Oh Notation - formal definition
Constants in the formal definition of Big O
What is an easy way for finding C and N when proving the Big-Oh of an Algorithm?
Confused on how to find c and k for big O notation if f(x) = x^2+2x+1
1) n > n0 - means that we agree that for small n A might need more than k*f(n) operations. Eg. bubble sort might be faster than quick sort or merge sort for very small inputs. Choice of 0 as a subscript is completely due to author preferences.
2) Larger input size.
3) k is a constant. Suppose one algorithm performs 1000*n operation for input of size n, so it is O(n). Another algorithm needs 5*n^2 operations for input of size n. That means for input of size 100, first algorithm needs 100,000 ops and the second one 50,000 ops. So, for input size about 100 you better choose the second one though it is quadratic, and the first one is linear. On the following picture you can see that n0 = 200, because only with n greater than 200 quadratic function becomes more expensive than linear (here i assume that k equals 1).
n is the problem size, however that is best measured. Thus n0 is a specific constant n, specifically the threshold after which the relationship holds. The specific value is irrelevant for big-oh, being only interested in its existence.
k is also an arbitrary constant, whose bare existence (in conjunction with n0) is important for big-oh.
Naturally, people are also interested in smaller problems, and in fact the perfect algorithm for a big problem might be decidedly inefficient for a small one, due to the constants involved.
It means the first value for n for which the rest holds true (i.e. we're only interested in high enough values for n)
Problem size, usually the size of the input.
It means you don't care about the different (for example) between 3*n^2 and 400*n^2, so any value that is high enough to satisfy the equation is OK.
All of these conditions aim to simplify the O notation, making the difference between simple and complex operations mute (e.g. you don't care if an operation is one or 20 cycles as long as the number is finite).

How will you know if k is constant for O(n^k) in subset sum?

Subset sum problem: Given a set of numbers S and a target number let say 0. The aim is to find a subset S’ of S such that the elements in it add up to 0. I heard that this problem becomes polynomial if the size of S’ is given.
For example, if you have a clue that 3 elements add up to 0 we can come up with complexity O(n^3).
The class P consists of those problems that are solvable in polynomial time. For example, they could be solved in O(n^k) for some constant k, where n is the size of the input.
(n^k) denotes the number of subsets of size k of [n]; or, equivalently, the number of ways in which we can select k diferent elements from an n-element set.
With n elements; a k-subset of a set is a subset with k elements.
Therefore let say there is an algorithm in polynomial time that finds or locate the k-subset that sum to 0 from n elements. I mean if k is an input of the algorithm such that k can also be greater than 3. Can we say k is constant or what?
If the running time of an algorithm for some problem with some input is bounded by O(n^k), it depends on various things whether this runtime bound is considered to be polynomial, pseudo-polynomial or none of these. If k is some specific constant like k=3, the bound is polynomial. If k is part of the input, the runtime bound is not considered polynomially bounded.
The concept of runtime bounds is briefly explained here; however note that informal usage of the term 'polynomial runtime bound' is usually somewhat sloppy. In the most exact sense, an algorithm A solving a problem P can have a runtime bound that is polynomially bounded in the encoding length of its input. This means that the the bound also is to be seen in relation to the specific encoding of instances of P for A.
Furthermore, as usually a binary encoding of numbers is used for algorithms, the encoded numbers may grow exponentially in their encoding length. If A has a runtime bound that is polynomially bounded in a numeric value of its input, but not bounded in the encoding length of the input, the bound is said to be pseudo-polynomial, as briefly explained here.
I hope this helps, the specific details are usually a bit inaccurate in informal explanations.

Resources