Comparing non-deterministic and also the deterministic expressive power of FA, PDA's and TM'S - computation-theory

I am sorta confused and also could not find a answer online, but in terms of expressive power, .
Non-deterministic FA, PDA, TM
NFA < NPDA < NTM
Deterministic FA,PDA,TM: This is where i am confused
DFA < PDA < TM?
In a whole: ?
DFA = NFA = e-NFA = RE < DPDA < NPDA = NCFL = DCFL < NTM = DTM?
Please correct me or am i correct?

DFA < PDA < TM is correct.
Remember that DFA = NFA = RegEx in power, every NFA can be converted to a DFA which can be converted to RegEx and the other way around.
Context free languages are a little different.
PDA ≠ NPDA. You can construct PDA's for a subset of the context free languages, but you can construct an NPDA for any context free language.
Turing Machines, deterministic and non-deterministics, have equally powerful. Remember that you can simulate any NTM using a single tape DTM.
So in a whole,
DFA = NFA = e-NFA = RE < DPDA < NPDA = NCFL = DCFL < NTM = DTM
is the correct intuition.

Related

integer linear programming on 3-partition of a special set

background: Sis a set consisting of the following 7-length sequences s: (1) each digit of s is a, b, or c; (2) s has and only has one digit that is c.
T is a set consisting of the following 7-length sequences t: (1) each digit of t is a, b, or c; (2) t has two digits that are c.
Is there a 3-partition S=A0⋃A1⋃A2, Aj∩Ai=∅ with the following property: for any Aj and any t ∈ T, there is a s ∈ Aj such that exsits a n∈{1,2,3,4,5,6,7}, sn≠tn, tn=c and sm=tm for any m≠n, where sn (or tn) is the n-th digit of s (or t).
For example, t=ccaabca and s=acaabca where n=1.
I used integer linear programming to solve the problem via lingo. I do not know how to solve the original problem directly, but I'd like to have the A0 as small as possible via lingo first.
Here is the code:
MODEL:
SETS:
Y/1..448/:C,X;
Z/1..672/;
cooperation(Y,Z):A;
ENDSETS
DATA:
A=#the big incidence matrix#
C=#1,1,1,... 448 times 1#
ENDDATA;
MIN=#SUM(Y:C*X);
#FOR(Y:#BIN(X));
#for(Z(j):#sum(Y(i):X(i)*A(i,j))>1);
#for(Z(j):#sum(Y(i):X(i)*A(i,j))<2);
END
But the code run a long time without any answer.
I appreciate any answers to original questions or suggestions for lingo code.
Seems like a coding theory problem, which tend to be very hard,
especially with integer programming due to the symmetry (maybe you have
access to a good solver with symmetry breaking, but I still tend to
think that constraint programming will fare better). The smallest part
of the partition must have at most ⌊448/3⌋ = 149 strings, yet a quick
constraint solver setup (OR-Tools CP-SAT solver, below) couldn’t get
there in the time that I ran it.
import itertools
import operator
from ortools.sat.python import cp_model
S = set()
T = set()
for s in itertools.product("abc", repeat=7):
k = s.count("c")
if k == 1:
S.add("".join(s))
elif k == 2:
T.add("".join(s))
def hamming(s, t):
return sum(map(operator.ne, s, t))
edges = [(s, t) for s in S for t in T if hamming(s, t) == 1]
model = cp_model.CpModel()
include = {s: model.NewBoolVar(s) for s in S}
for t in T:
model.AddBoolOr([include[s] for s in S if hamming(s, t) == 1])
model.Minimize(sum(include.values()))
solver = cp_model.CpSolver()
solver.parameters.log_search_progress = True
status = solver.Solve(model)
print({s for s in include if solver.Value(include[s])})

How is `(d*a)mod(b)=1` written in Ruby?

How should I write this:
(d*a)mod(b)=1
in order to make it work properly in Ruby? I tried it on Wolfram, but their solution:
(da(b, d))/(dd) = -a/d
doesn't help me. I know a and b. I need to solve (d*a)mod(b)=1 for d in the form d=....
It's not clear what you're asking, and, depending on what you mean, a solution may be impossible.
First off, (da(b, d))/(dd) = -a/d, is not a solution to that equation; rather, it's a misinterpretation of the notation used for partial derivatives. What Wolfram Alpha actually gave you was:
, which is entirely unrelated.
Secondly, if you're trying to solve (d*a)mod(b)=1 for d, you may be out of luck. For any value of a and b, where a and b have a common prime factor, there are an infinite number of values of d that satisfy the equation. If a and b are coprime, you can use the formula given in LutzL's answer.
Additionally, if you're looking to perform symbolic manipulation of equations, Ruby is likely not the proper tool. Consider using a CAS, like Python's SymPy or Wolfram Mathematica.
Finally, if you're just trying to compute (d*a)mod(b), the modulo operator in Ruby is %, so you'd write (d*a)%(b).
You are looking for the modular inverse of a modulo b.
For any two numbers a,b the extended euclidean algorithm
g,u,v = xgcd(a, b)
gives coefficients u,v such that
u*a+v*b = g
and g is the greatest common divisor. You need a,b co-prime, preferably by ensuring that b is a prime number, to get g=1 and then you can set d=u.
xgcd(a,b)
if b = 0
return (a,1,0)
q,r = a divmod b
// a = q*b + r
g,u,v = xgcd(b, r)
// g = u*b + v*r = u*b + v*(a-q*b) = v*a+(u-q*v)*b
return g,v,u - q*v

Understanding the Viterbi algorithm

I was looking for a precise step by step example of the Viterbi algorithm.
Considering sentence tagging with the input sentence as:
The cat saw the angry dog jump
And from this I would like to generate the most probable output as:
D N V T A N V
How do we use the Viterbi algorithm to get the above output using a trigram-HMM?
(PS: I'm looking for a precise step by step explanation, not a piece of code, or math representation. Assume all probabilities as numbers.)
Thanks a ton!
I suggest that you look it up in one of the books available, e.g. Chris Bishop "Pattern Recognition and Machine Learning". Viterbi algorithm is a really basic thing and has been described in various levels of detail in the literature.
For Viterbi algorithm and Hidden Markov Model, you first need the transition probability and emission probability.
In your example, the transition probability is P(D->N), P(N->V) and the emission probability (assuming bigram model) is P(D|the), P(N|cat).
Of course, in real world example, there are a lot more word than the, cat, saw, etc. You have to loop through all your training data to have estimate of P(D|the), P(N|cat), P(N|car). Then we use Viterbi algorithm to find the most likely sequence of tags such as
D N V T A N V
given your observation.
Here is my implementation of Viterbi.
def viterbi(vocab, vocab_tag, words, tags, t_bigram_count, t_unigram_count, e_bigram_count, e_unigram_count, ADD_K):
vocab_size = len(vocab)
V = [{}]
for t in vocab_tag:
# Prob of very first word
prob = np.log2(float(e_bigram_count.get((words[0],t),0)+ADD_K))-np.log2(float(e_unigram_count[t]+vocab_size*ADD_K))
# trigram V[0][0]
V[0][t] = {"prob": prob, "prev": None}
for i in range(1,len(words)):
V.append({})
for t in vocab_tag:
V[i][t] = {"prob": np.log2(0), "prev": None}
for t in vocab_tag:
max_trans_prob = np.log2(0);
for prev_tag in vocab_tag:
trans_prob = np.log2(float(t_bigram_count.get((t, prev_tag),0)+ADD_K))-np.log2(float(t_unigram_count[prev_tag]+vocab_size*ADD_K))
if V[i-1][prev_tag]["prob"]+trans_prob > max_trans_prob:
max_trans_prob = V[i-1][prev_tag]["prob"]+trans_prob
max_prob = max_trans_prob+np.log2(e_bigram_count.get((words[i],t),0)+ADD_K)-np.log2(float(e_unigram_count[t]+vocab_size*ADD_K))
V[i][t] = {"prob": max_prob, "prev": prev_tag}
opt = []
previous = None
max_prob = max(value["prob"] for value in V[-1].values())
# Get most probable state and its backtrack
for st, data in V[-1].items():
if data["prob"] == max_prob:
opt.append(st)
previous = st
break
for t in range(len(V) - 2, -1, -1):
opt.insert(0, V[t + 1][previous]["prev"])
previous = V[t][previous]["prev"]
return opt

Fast solution to Subset sum

Consider this way of solving the Subset sum problem:
def subset_summing_to_zero (activities):
subsets = {0: []}
for (activity, cost) in activities.iteritems():
old_subsets = subsets
subsets = {}
for (prev_sum, subset) in old_subsets.iteritems():
subsets[prev_sum] = subset
new_sum = prev_sum + cost
new_subset = subset + [activity]
if 0 == new_sum:
new_subset.sort()
return new_subset
else:
subsets[new_sum] = new_subset
return []
I have it from here:
http://news.ycombinator.com/item?id=2267392
There is also a comment which says that it is possible to make it "more efficient".
How?
Also, are there any other ways to solve the problem which are at least as fast as the one above?
Edit
I'm interested in any kind of idea which would lead to speed-up. I found:
https://en.wikipedia.org/wiki/Subset_sum_problem#cite_note-Pisinger09-2
which mentions a linear time algorithm. But I don't have the paper, perhaps you, dear people, know how it works? An implementation perhaps? Completely different approach perhaps?
Edit 2
There is now a follow-up:
Fast solution to Subset sum algorithm by Pisinger
I respect the alacrity with which you're trying to solve this problem! Unfortunately, you're trying to solve a problem that's NP-complete, meaning that any further improvement that breaks the polynomial time barrier will prove that P = NP.
The implementation you pulled from Hacker News appears to be consistent with the pseudo-polytime dynamic programming solution, where any additional improvements must, by definition, progress the state of current research into this problem and all of its algorithmic isoforms. In other words: while a constant speedup is possible, you're very unlikely to see an algorithmic improvement to this solution to the problem in the context of this thread.
However, you can use an approximate algorithm if you require a polytime solution with a tolerable degree of error. In pseudocode blatantly stolen from Wikipedia, this would be:
initialize a list S to contain one element 0.
for each i from 1 to N do
let T be a list consisting of xi + y, for all y in S
let U be the union of T and S
sort U
make S empty
let y be the smallest element of U
add y to S
for each element z of U in increasing order do
//trim the list by eliminating numbers close to one another
//and throw out elements greater than s
if y + cs/N < z ≤ s, set y = z and add z to S
if S contains a number between (1 − c)s and s, output yes, otherwise no
Python implementation, preserving the original terms as closely as possible:
from bisect import bisect
def ssum(X,c,s):
""" Simple impl. of the polytime approximate subset sum algorithm
Returns True if the subset exists within our given error; False otherwise
"""
S = [0]
N = len(X)
for xi in X:
T = [xi + y for y in S]
U = set().union(T,S)
U = sorted(U) # Coercion to list
S = []
y = U[0]
S.append(y)
for z in U:
if y + (c*s)/N < z and z <= s:
y = z
S.append(z)
if not c: # For zero error, check equivalence
return S[bisect(S,s)-1] == s
return bisect(S,(1-c)*s) != bisect(S,s)
... where X is your bag of terms, c is your precision (between 0 and 1), and s is the target sum.
For more details, see the Wikipedia article.
(Additional reference, further reading on CSTheory.SE)
While my previous answer describes the polytime approximate algorithm to this problem, a request was specifically made for an implementation of Pisinger's polytime dynamic programming solution when all xi in x are positive:
from bisect import bisect
def balsub(X,c):
""" Simple impl. of Pisinger's generalization of KP for subset sum problems
satisfying xi >= 0, for all xi in X. Returns the state array "st", which may
be used to determine if an optimal solution exists to this subproblem of SSP.
"""
if not X:
return False
X = sorted(X)
n = len(X)
b = bisect(X,c)
r = X[-1]
w_sum = sum(X[:b])
stm1 = {}
st = {}
for u in range(c-r+1,c+1):
stm1[u] = 0
for u in range(c+1,c+r+1):
stm1[u] = 1
stm1[w_sum] = b
for t in range(b,n+1):
for u in range(c-r+1,c+r+1):
st[u] = stm1[u]
for u in range(c-r+1,c+1):
u_tick = u + X[t-1]
st[u_tick] = max(st[u_tick],stm1[u])
for u in reversed(range(c+1,c+X[t-1]+1)):
for j in reversed(range(stm1[u],st[u])):
u_tick = u - X[j-1]
st[u_tick] = max(st[u_tick],j)
return st
Wow, that was headache-inducing. This needs proofreading, because, while it implements balsub, I can't define the right comparator to determine if the optimal solution to this subproblem of SSP exists.
I don't know much python, but there is an approach called meet in the middle.
Pseudocode:
Divide activities into two subarrays, A1 and A2
for both A1 and A2, calculate subsets hashes, H1 and H2, the way You do it in Your question.
for each (cost, a1) in H1
if(H2.contains(-cost))
return a1 + H2[-cost];
This will allow You to double the number of elements of activities You can handle in reasonable time.
I apologize for "discussing" the problem, but a "Subset Sum" problem where the x values are bounded is not the NP version of the problem. Dynamic programing solutions are known for bounded x value problems. That is done by representing the x values as the sum of unit lengths. The Dynamic programming solutions have a number of fundamental iterations that is linear with that total length of the x's. However, the Subset Sum is in NP when the precision of the numbers equals N. That is, the number or base 2 place values needed to state the x's is = N. For N = 40, the x's have to be in the billions. In the NP problem the unit length of the x's increases exponentially with N.That is why the dynamic programming solutions are not a polynomial time solution to the NP Subset Sum problem. That being the case, there are still practical instances of the Subset Sum problem where the x's are bounded and the dynamic programming solution is valid.
Here are three ways to make the code more efficient:
The code stores a list of activities for each partial sum. It is more efficient in terms of both memory and time to just store the most recent activity needed to make the sum, and work out the rest by backtracking once a solution is found.
For each activity the dictionary is repopulated with the old contents (subsets[prev_sum] = subset). It is faster to simply grow a single dictionary
Splitting the values in two and applying a meet in the middle approach.
Applying the first two optimisations results in the following code which is more than 5 times faster:
def subset_summing_to_zero2 (activities):
subsets = {0:-1}
for (activity, cost) in activities.iteritems():
for prev_sum in subsets.keys():
new_sum = prev_sum + cost
if 0 == new_sum:
new_subset = [activity]
while prev_sum:
activity = subsets[prev_sum]
new_subset.append(activity)
prev_sum -= activities[activity]
return sorted(new_subset)
if new_sum in subsets: continue
subsets[new_sum] = activity
return []
Also applying the third optimisation results in something like:
def subset_summing_to_zero3 (activities):
A=activities.items()
mid=len(A)//2
def make_subsets(A):
subsets = {0:-1}
for (activity, cost) in A:
for prev_sum in subsets.keys():
new_sum = prev_sum + cost
if new_sum and new_sum in subsets: continue
subsets[new_sum] = activity
return subsets
subsets = make_subsets(A[:mid])
subsets2 = make_subsets(A[mid:])
def follow_trail(new_subset,subsets,s):
while s:
activity = subsets[s]
new_subset.append(activity)
s -= activities[activity]
new_subset=[]
for s in subsets:
if -s in subsets2:
follow_trail(new_subset,subsets,s)
follow_trail(new_subset,subsets2,-s)
if len(new_subset):
break
return sorted(new_subset)
Define bound to be the largest absolute value of the elements.
The algorithmic benefit of the meet in the middle approach depends a lot on bound.
For a low bound (e.g. bound=1000 and n=300) the meet in the middle only gets a factor of about 2 improvement other the first improved method. This is because the dictionary called subsets is densely populated.
However, for a high bound (e.g. bound=100,000 and n=30) the meet in the middle takes 0.03 seconds compared to 2.5 seconds for the first improved method (and 18 seconds for the original code)
For high bounds, the meet in the middle will take about the square root of the number of operations of the normal method.
It may seem surprising that meet in the middle is only twice as fast for low bounds. The reason is that the number of operations in each iteration depends on the number of keys in the dictionary. After adding k activities we might expect there to be 2**k keys, but if bound is small then many of these keys will collide so we will only have O(bound.k) keys instead.
Thought I'd share my Scala solution for the discussed pseudo-polytime algorithm described in wikipedia. It's a slightly modified version: it figures out how many unique subsets there are. This is very much related to a HackerRank problem described at https://www.hackerrank.com/challenges/functional-programming-the-sums-of-powers. Coding style might not be excellent, I'm still learning Scala :) Maybe this is still helpful for someone.
object Solution extends App {
var input = "1000\n2"
System.setIn(new ByteArrayInputStream(input.getBytes()))
println(calculateNumberOfWays(readInt, readInt))
def calculateNumberOfWays(X: Int, N: Int) = {
val maxValue = Math.pow(X, 1.0/N).toInt
val listOfValues = (1 until maxValue + 1).toList
val listOfPowers = listOfValues.map(value => Math.pow(value, N).toInt)
val lists = (0 until maxValue).toList.foldLeft(List(List(0)): List[List[Int]]) ((newList, i) =>
newList :+ (newList.last union (newList.last.map(y => y + listOfPowers.apply(i)).filter(z => z <= X)))
)
lists.last.count(_ == X)
}
}

Find the words in a long stream of characters. Auto-tokenize

How would you find the correct words in a long stream of characters?
Input :
"The revised report onthesyntactictheoriesofsequentialcontrolandstate"
Google's Output:
"The revised report on syntactic theories sequential controlandstate"
(which is close enough considering the time that they produced the output)
How do you think Google does it?
How would you increase the accuracy?
I would try a recursive algorithm like this:
Try inserting a space at each position. If the left part is a word, then recur on the right part.
Count the number of valid words / number of total words in all the final outputs. The one with the best ratio is likely your answer.
For example, giving it "thesentenceisgood" would run:
thesentenceisgood
the sentenceisgood
sent enceisgood
enceisgood: OUT1: the sent enceisgood, 2/3
sentence isgood
is good
go od: OUT2: the sentence is go od, 4/5
is good: OUT3: the sentence is good, 4/4
sentenceisgood: OUT4: the sentenceisgood, 1/2
these ntenceisgood
ntenceisgood: OUT5: these ntenceisgood, 1/2
So you would pick OUT3 as the answer.
Try a stochastic regular grammar (equivalent to hidden markov models) with the following rules:
for every word in a dictionary:
stream -> word_i stream with probability p_w
word_i -> letter_i1 ...letter_in` with probability q_w (this is the spelling of word_i)
stream -> letter stream with prob p (for any letter)
stream -> epsilon with prob 1
The probabilities could be derived from a training set, but see the following discussion.
The most likely parse is computed using the Viterbi algorithm, which has quadratic time complexity in the number of hidden states, in this case your vocabulary, so you could run into speed issues with large vocabularies. But what if you set all the p_w = 1, q_w = 1 p = .5 Which means, these are probabilities in an artificial language model where all words are equally likely and all non-words are equally unlikely. Of course you could segment better if you didn't use this simplification, but the algorithm complexity goes down by quite a bit. If you look at the recurrence relation in the wikipedia entry you can try and simplify it for this special case. The viterbi parse probability up to position k can be simplified to VP(k) = max_l(VP(k-l) * (1 if text[k-l:k] is a word else .5^l) You can bound l with the maximim length of a word and find if a l letters form a word with a hash search. The complexity of this is independent of the vocabulary size and is O(<text length> <max l>). Sorry this is not a proof, just a sketch but should get you going. Another potential optimization, if you create a trie of the dictionary, you can check if a substring is a prefix of any correct word. So when you query text[k-l:k] and get a negative answer, you already know that the same is true for text[k-l:k+d] for any d. To take advantage of this you would have to rearrange the recursion significantly, so I am not sure this can be fully exploited (it can see comment).
Here is a code in Mathematica I started to develop for a recent code golf.
It is a minimal matching, non greedy, recursive algorithm. That means that the sentence "the pen is mighter than the sword" (without spaces) returns {"the pen is might er than the sword} :)
findAll[s_] :=
Module[{a = s, b = "", c, sy = "="},
While[
StringLength[a] != 0,
j = "";
While[(c = findFirst[a]) == {} && StringLength[a] != 0,
j = j <> StringTake[a, 1];
sy = "~";
a = StringDrop[a, 1];
];
b = b <> " " <> j ;
If[c != {},
b = b <> " " <> c[[1]];
a = StringDrop[a, StringLength[c[[1]]]];
];
];
Return[{StringTrim[StringReplace[b, " " -> " "]], sy}];
]
findFirst[s_] :=
If[s != "" && (c = DictionaryLookup[s]) == {},
findFirst[StringDrop[s, -1]], Return[c]];
Sample Input
ss = {"twodreamstop",
"onebackstop",
"butterfingers",
"dependentrelationship",
"payperiodmatchcode",
"labordistributioncodedesc",
"benefitcalcrulecodedesc",
"psaddresstype",
"ageconrolnoticeperiod",
"month05",
"as_benefits",
"fname"}
Output
twodreamstop = two dreams top
onebackstop = one backstop
butterfingers = butterfingers
dependentrelationship = dependent relationship
payperiodmatchcode = pay period match code
labordistributioncodedesc ~ labor distribution coded es c
benefitcalcrulecodedesc ~ benefit c a lc rule coded es c
psaddresstype ~ p sad dress type
ageconrolnoticeperiod ~ age con rol notice period
month05 ~ month 05
as_benefits ~ as _ benefits
fname ~ f name
HTH
Check spelling correction algorithm. Here is a link to an article on algorithm used in google - http://www.norvig.com/spell-correct.html. Here you will find a scientific paper on this topic from google.
After doing the recursive splitting and dictionary lookup, to increase the quality of word pairs in your your phrase you might be interested to employ Mutual information of Word pairs.
This is essentially going though a training set and finding out M.I. values of word pairs that tells you that Albert Simpson is less Likely than Albert Einstein :)
You can try searching Science Direct for academic papers in this theme. For basic information on Mutual information see http://en.wikipedia.org/wiki/Mutual_information
Last year I had been involved in the phrase search part of a search engine project in which I was trying to parse though wikipedia dataset and rank each word pair. I've got the code in C++ if you care could share it with you if you can find some use of it. It parses wikimedia and for every word pair finds out the mutual information.

Resources