Computability: SAT Formula with Bounded Number of Clauses - computability

Define SAT2016 = {\phi | \phi is a CNF formula with at most 2016 clauses}.
Assuming P \neq NP, is SAT2016 NP-complete?
Since the number of literals in each clause isn't bounded, it's not immediately clear whether there exists a polynomial time algorithm for checking the satisfiability of a formula with a constant bound on the number of clauses.
Your ideas are welcome.

SAT2016 is in P.
Observe that in order to satisfy the formula you have to assign 1 to at least one literal of every clause. Each clause contains at most 2n literals. Therefore, the number of ways to choose a single literal from every clause is at most (2n)^2016. In order to find out whether the formula is satisfiable, you should iterate over (at most) (2n)^2016 possibilities (to choose a single literal from every clause) and check for each possibility if it's legal. That is, for each choice of 2016 literals (one from each clause) you should check if two of the 2016 literals happen to be a certain variable and its negation. If that is the case, you move on to the next choice of 2016 literals. If you went through all the (2n)^2016 possibilities and found out that they all contain a conflict, you conclude that the formula is unsatisfiable.
Since there are at most (2n)^2016 possibilities, and checking a given possibility takes constant time (because you just have to loop over all possible pairs in a set of 2016 literals), the running time of the algorithm is polynomial in n.

Related

Calculating the hash of any substring in logarithmic time

Question came up in relation to this article:
https://threads-iiith.quora.com/String-Hashing-for-competitive-programming
The author presents this algorithm for hashing a string:
where S is our string, Si is the character at index i, and p is a prime number we've chosen.
He then presents the problem of determining whether a substring of a given string is a palindrome and claims it can be done in logarithmic time through hashing.
He makes the point we can calculate from the beginning of our whole string to the right edge of our substring:
and observes that if we calculate the hash from the beginning to the left edge of our substring (F(L-1)), the difference between this and our hash to our right edge is basically the hash of our substring:
This is all fine, and I think I follow it so far. But he then immediately makes the claim that this allows us to calculate our hash (and thus determine if our substring is a palindrome by comparing this hash with the one generated by moving through our substring in reverse order) in logarithmic time.
I feel like I'm probably missing something obvious but how does this allow us to calculate the hash in logarithmic time?
You already know that you can calculate the difference in constant time. Let me restate the difference (I'll leave the modulo away for clarity):
diff = ∑_{i=L to R} S_i ∗ p^i
Note that this is not the hash of the substring because the powers of p are offset by a constant. Instead, this is (as stated in the article)
diff = Hash(S[L,R])∗p^L
To derive the hash of the substring, you have to multiply the difference with p^-L. Assuming that you already know p^-1 (this can be done in a preprocessing step), you need to calculate (p^-1)^L. With the square-and-multiply method, this takes O(log L) operations, which is probably what the author refers to.
This may become more efficient if your queries are sorted by L. In this case, you could calculate p^-L incrementally.

Recursive language

"if a language is recursive, then there exists a method by which the strings in language can be written in some sequence"
I am also told that "if a language can be enumerated in a lexicographic order by some Turing machine, then such a language is called recursive"
First : are the two statements are different?
Second : should it be a lexicographic order only?
Think back to the reason why lexicographical order is necessary for the definition of a recursive language:
A language is recursive if it can be decided. That is to say, for a given word W and a given language L, it is possible to know whether W is a member of L, or not in finitely many steps.
A language is recursively enumerable if it can be accepted. That is to say, for a given word W and a given language L, it is possible to know whether W is a member of L in finitely many steps, but it is not possible to know whether W is not a member of L.
So if a machine just enumerated the words of L in any order, you can check to see if your word W is in that list. If it is, you stop. If it isn't, you have to wait forever to see if your word is eventually output by the machine. The language is recursively enumerable.
If you knew the order, though, you could evaluate whether the machine should have output W by now. If the machine has output a word X, and according to the ordering you know the machine is using, W is before X, you know that the machine will not ever emit W, so you know that W is not a member of L.
Lexicographical order is one of many total orderings of words that satisfy the property that you can tell when your word W should have been output, so if you don't see it by then, you can stop.
Other orders:
https://en.wikipedia.org/wiki/Lexicographical_order#Colexicographic_order
https://en.wikipedia.org/wiki/Kleene%E2%80%93Brouwer_order
So to answer your specific questions:
Are the two statements different?
Yes.
The first statement states "in some sequence", which does not specify that the sequence must be a total order over L's alphabet. Therefore the first statement is incorrect. The first statement defines a recursively enumerable language.
The second statement is correct, but is more restrictive than it needs to be. Lexicographical order is only one total order over an alphabet. Others can be used.
Should it be a lexicographical order only?
No.
As above, as long as the machine guarantees output in any total order over the alphabet, the language is recursive.

Maximize evaluation of expression with one parenthesis insertion

I encountered this problem in a programming contest:
Given expression x1 op x2 op x3 op . . . op xn, where op is either addition '+' or multiplication '*' and xi are digits between 1 to 9. The goal is to insert just one set of parenthesis within the expression such that it maximizes the result of the expression.
The n is maximum 2500.
Eg.:
Input:
3+5*7+8*4
Output:
303
Explanation:
3+5*(7+8)*4
There was another constraint given in the problem that at max only 15 '*' sign will be present. This simplified the problem. As we will have just 17 options of brackets insertion and brute force would work in O(17*n).
I have been thinking if this constraint was not present, then can I theoretically solve the problem in O(n^2)? It seemed to me a DP problem. I am saying theoretically because the answers will be quite big (9^2500 possible). So if I ignore the time complexity of working with big numbers then is O(n^2) possible?
If there is no multiplication, you are finished.
If there is no addition, you are finished.
The leading and trailing operation of subterms that have to be evaluated always are additions, because parenthesis around a multiplication does not alter the outcome.
If you have subterms with only additions, you do not need to evaluate subparts of them. Multiplication of the full subterm will always be bigger. (Since we only have positiv numbers/digits.)
Travers the term once, trying to place the opening parenthesis after (worst case) each * that is succeeded with a +, and within that loop a second time, trying to place the closing parenthesis before (worst case) each succeeding * that immediately follows an +.
You can solve the problem in O(ma/2), with m: number of multiplications and a: number of additions. This is smaller than n^2.
Possible places for parenthesis shown with ^:
1*2*^3+4+5^*6*^7+8^

Tokenize valid words from a long string

Suppose you have a dictionary that contains valid words.
Given an input string with all spaces removed, determine whether the string is composed of valid words or not.
You can assume the dictionary is a hashtable that provides O(1) lookup.
Some examples:
helloworld-> hello world (valid)
isitniceinhere-> is it nice in here (valid)
zxyy-> invalid
If a string has multiple possible parsings, just return true is sufficient.
The string can be very long. Hence think an algorithm that is both space & time efficient.
I think the set of all strings that occur as the concatenation of valid words (words taken from a finite dictionary) form a regular language over the alphabet of characters. You can then build a finite automaton that accepts exactly the strings you want; computation time is O(n).
For instance, let the dictionary consist of the words {bat, bag}. Then we construct the following automaton: states are denoted by 0, 1, 2. Edges: (0,1,b), (1,2,a), (2,0,t), (2,0,g); where the triple (x,y,z) means an edge leading from x to y on input z. The only accepting state is 0. In each step, on reading the next input sign, you have to calculate the set of states that are reachable on that input. Given that the number of states in the automaton is constant, this is of complexity O(n). As for space complexity, I think you can do with O(number of words) with the hint for construction above.
For an other example, with the words {bag, bat, bun, but} the automaton would look like this:
Supposing that the automaton has already been built (the time to do this has something to do with the length and number of words :-) we now argue that the time to decide whether a string is accepted by the automaton is O(n) where n is the length of the input string.
More formally, our algorithm is as follows:
Let S be a set of states, initially containing the starting state.
Read the next input character, let us denote it by a.
For each element s in S, determine the state that we move into from s on reading a; that is, the state r such that with the notation above (s,r,a) is an edge. Let us denote the set of these states by R. That is, R = {r | s in S, (s,r,a) is an edge}.
(If R is empty, the string is not accepted and the algorithm halts.)
If there are no more input symbols, check whether any of the accepting states is in R. (In our case, there is only one accepting state, the starting state.) If so, the string is accepted, if not, the string is not accepted.
Otherwise, take S := R and go to 2.
Now, there are as many executions of this cycle as there are input symbols. The only thing we have to examine is that steps 3 and 5 take constant time. Given that the size of S and R is not greater than the number of states in the automaton, which is constant and that we can store edges in a way such that lookup time is constant, this follows. (Note that we of course lose multiple 'parsings', but that was not a requirement either.)
I think this is actually called the membership problem for regular languages, but I couldn't find a proper online reference.
I'd go for a recursive algorithm with implicit backtracking. Function signature: f: input -> result, with input being the string, result either true or false depending if the entire string can be tokenized correctly.
Works like this:
If input is the empty string, return true.
Look at the length-one prefix of input (i.e., the first character). If it is in the dictionary, run f on the suffix of input. If that returns true, return true as well.
If the length-one prefix from the previous step is not in the dictionary, or the invocation of f in the previous step returned false, make the prefix longer by one and repeat at step 2. If the prefix cannot be made any longer (already at the end of the string), return false.
Rinse and repeat.
For dictionaries with low to moderate amount of ambiguous prefixes, this should fetch a pretty good running time in practice (O(n) in the average case, I'd say), though in theory, pathological cases with O(2^n) complexity can probably be constructed. However, I doubt we can do any better since we need backtracking anyways, so the "instinctive" O(n) approach using a conventional pre-computed lexer is out of the question. ...I think.
EDIT: the estimate for the average-case complexity is likely incorrect, see my comment.
Space complexity would be only stack space, so O(n) even in the worst-case.

Puzzle: Need an example of a "complicated" equivalence relation / partitioning that disallows sorting and/or hashing

From the question "Is partitioning easier than sorting?":
Suppose I have a list of items and an
equivalence relation on them, and
comparing two items takes constant
time. I want to return a partition of
the items, e.g. a list of linked
lists, each containing all equivalent
items.
One way of doing this is to extend the
equivalence to an ordering on the
items and order them (with a sorting
algorithm); then all equivalent items
will be adjacent.
(Keep in mind the distinction between equality and equivalence.)
Clearly the equivalence relation must be considered when designing the ordering algorithm. For example, if the equivalence relation is "people born in the same year are equivalent", then sorting based on the person's name is not appropriate.
Can you suggest a datatype and equivalence relation such that it is not possible to create an ordering?
How about a datatype and equivalence relation where it is possible to create such an ordering, but it is not possible to define a hash function on the datatype that will map equivalent items to the same hash value.
(Note: it is OK if nonequivalent items map to the same hash value (collide) -- I'm not asking to solve the collision problem -- but on the other hand, hashFunc(item) { return 1; } is cheating.)
My suspicion is that for any datatype/equivalence pair where it is possible to define an ordering, it will also be possible to define a suitable hash function, and they will have similar algorithmic complexity. A counterexample to that conjecture would be enlightening!
The answer to questions 1 and 2 is no, in the following sense: given a computable equivalence relation ≡ on strings {0, 1}*, there exists a computable function f such that x ≡ y if and only if f(x) = f(y), which leads to an order/hash function. One definition of f(x) is simple, and very slow to compute: enumerate {0, 1}* in lexicographic order (ε, 0, 1, 00, 01, 10, 11, 000, …) and return the first string equivalent to x. We are guaranteed to terminate when we reach x, so this algorithm always halts.
Creating a hash function and an ordering may be expensive but will usually be possible. One trick is to represent an equivalence class by a pre-arranged member of that class, for instance, the member whose serialised representation is smallest, when considered as a bit string. When somebody hands you a member of an equivalence class, map it to this canonicalised member of that class, and then hash or compare the bit string representation of that member. See e.g. http://en.wikipedia.org/wiki/Canonical#Mathematics
Examples where this is not possible or convenient include when somebody gives you a pointer to an object that implements equals() but nothing else useful, and you do not get to break the type system to look inside the object, and when you get the results of a survey that only asks people to judge equality between objects. Also Kruskal's algorithm uses Union&Find internally to process equivalence relations, so presumbly for this particular application nothing more cost-effective has been found.
One example that seems to fit your request is an IEEE floating point type. In particular, a NaN doesn't compare as equivalent to anything else (nor even to itself) unless you take special steps to detect that it's a NaN, and always call that equivalent.
Likewise for hashing. If memory serves, any floating point number with all bits of the significand set to 0 is treated as having the value 0.0, regardless of what the bits in the exponent are set to. I could be remembering that a bit wrong, but the idea is the same in any case -- the right bit pattern in one part of the number means that it has the value 0.0, regardless of the bits in the rest. Unless your hash function takes this into account, it will produce different hash values for numbers that really compare precisely equal.
As you probably know, comparison-based sorting takes at least O(n log n) time (more formally you would say it is Omega(n log n)). If you know that there are fewer than log2(n) equivalence classes, then partitioning is faster, since you only need to check equivalence with a single member of each equivalence class to determine which part in the partition you should assign a given element to.
I.e. your algorithm could be like this:
For each x in our input set X:
For each equivalence class Y seen so far:
Choose any member y of Y.
If x is equivalent to y:
Add x to Y.
Resume the outer loop with the next x in X.
If we get to here then x is not in any of the equiv. classes seen so far.
Create a new equivalence class with x as its sole member.
If there are m equivalence classes, the inner loop runs at most m times, taking O(nm) time overall. As ShreetvatsaR observes in a comment, there can be at most n equivalence classes, so this is O(n^2). Note this works even if there is not a total ordering on X.
Theoretically, it is alway possible (for questions 1 and 2), because of the Well Ordering Theorem, even when you have an uncountable number of partitions.
Even if you restrict to computable functions, throwawayaccount's answer answers that.
You need to more precisely define your question :-)
In any case,
Practically speaking,
Consider the following:
You data type is the set of unsigned integer arrays. The ordering is lexicographic comparison.
You could consider hash(x) = x, but I suppose that is cheating too :-)
I would say (but haven't thought more about getting a hash function, so might well be wrong) that partitioning by ordering is much more practical than partitioning by hashing, as hashing itself could become impractical. (A hashing function exists, no doubt).
I believe that...
1- Can you suggest a datatype and equivalence relation such that it is
not possible to create an ordering?
...it's possible only for infinite (possibly only for non-countable) sets.
2- How about a datatype and equivalence relation where it is
possible to create such an ordering,
but it is not possible to define a
hash function on the datatype that
will map equivalent items to the same
hash value.
...same as above.
EDIT: This answer is wrong
I am not going to delete it just because some of the comments below are enlightening
Not every equivalence relationship implies an order
As your equivalence relationship should not induce an order, let´s take an un-ordered distance function as relation.
If we get the set of functions f(x):R -> R as our datatype, and define an equivalence relation as:
f is equivalent to g if f(g(x)) = g(f(x) [commuting Operators][1]
Then you can't sort on that order (no injective function exists with the Real numbers). You just can't find a function which maps your datatype to numbers due to the cardinality of the function's space.
Suppose that F(X) is a function which maps an element of some data type T to another of the same type, such that for any Y of type T, there is exactly one X of type T such that F(X)=Y. Suppose further that the function is chosen so that there is generally no practical way of finding the X in the above equation for a given Y.
Define F0=X, F{1}(X)=F(X), F{2}(X)=F(F(X)), etc. so F{n}(X) = F(F{n-1}(X)).
Now define a data type Q containing a positive integer K and an object X of type T. Define an equivalence relation thus:
Q(a,X) vs Q(b,Y):
If a > b, the items are equal iff F{a-b}(Y)==X
If a < b, the items are equal iff F{b-a}(X)==Y
If a=b, the items are equal iff X==Y
For any given object Q(a,X) there exists exactly one Z for F{a}(Z)==X. Two objects are equivalent iif they would have the same Z. One could define an ordering or hash function based upon Z. On the other hand, if F is chosen such that its inverse cannot be practically computed, the only practical way to compare elements may be to use the equivalence function above. I know of no way to define an ordering or hash function without either knowing the largest possible "a" value an item could have, or having a means to invert function F.

Resources