{ w | at every odd position of w is a 1} - computation-theory

The task is to construct a DFA for this language over the alphabet {0,1}.
I have constructed a DFA that consists of 4 states and that does not accept an empty word. However, in the answers they give a 3 state DFA that accepts it.
Why should my DFA accept an empty word if in the empty word there is no 1 at the odd position which means that it is not in the language?

The only requirement is that any symbol at an odd position must be 1. There is no requirement for a particular number of symbols, and specifically not that there be at least one.
Therefore, a DFA with an initial state where 0 leads to a rejection state and where 1 leads to a second state which accepts either symbol and returns to the start would be an acceptable answer, and would accept the empty string. This would be a three-state machine:

I think you are confused why should an empty string be a part of a mentioned set.
Let's take a look at another example. Consider you have a set of all possible strings having every character equal to 0. Such strings would be 0, 00, 000, 00000, etc. What about an empty string *? It actually pertain to this set as well. Empty string does not violate the definition of the set.
Compare this example with yours. You should check every odd position of the string and if you'll find anything other than 1 you should say that it is not an element of you set. It is not said anything about whether a string should have an odd position to be checked.

Related

Convert string to perfect number

Given a string, we need to find the largest square which can be obtained by replace its characters by digits (leading zeros are not allowed) where same characters always map to the same digits and different characters always map to different digits. If no solution, return -1.
Consider the string "ab" If we replace character a with 8 and b with 1, we get 81, which is a square.
How to find it for given string ? It is given that string length can be at max 11.
Please help me find a suitable and efficient way
Sorry can't comment, not enough reputation for it so I'll answer here.
#mat7 about what you said in your question comments, no you don't have to do it for every letter from a to z. You only have to do it for the letters present in your string (so at max 12 letters, not 26).
The first thing I would even check is how much different letter you have, if it's 11 or 12 different letters you can directly return -1 since you can't have different letters having the same number.
Now, supposing the input string being "fdsadrtas", you take a new array with only each different letter => "fdsadrt"
And with this array you try all possibilities (exclude the obvious mismatching options, if you set 'f' to 4 and 'd' to 5, 's' can only be 12367890 (and f can never be 0)), this way you will exclude lots of possibilities, having as worst case 10! instead of 12^10. (actually 9*9! with the test of the first one never beeing 0 but it's close enough)
EDIT 2 : +1 samgak nice idea !
The last digit can only be 0,1,4,5,6,9 so the worst number of tests drop even to 9*6*8!
10! is by far small enough to be brute tested, keep the higher square value you found and you are done.
EDIT :
Actually It would work (in a finite reasonable amount of time) but it is the wrong approach now that I have thought about it.
You will use less time in looking all the squares numbers that could be a solution for your string, using the exemple I gave above it's a string of length 9, and checking each square who is length 9 if he could be successfully mapped into the string.
For a string of length 12 (the worst case) you will have to check the square values of 316'228 to 999'999, who is way less than the >2 millions check of the previous proposition. The other proposition might become faster if you start accepting long strings but with only 12 you are faster this way.

Minimum number of char substitutions to get a palindrome

I would like to solve this problem from TopCoder, in which a String is given and in each step you have to replace all occurrences of an character (of your choice) with another character (of your choice), so that at the end after all steps you get a palindrome. The problem is to identify the minimum total number of replacements.
Ideas so far:
I can identify that the string after every step is simply a node/vertex in a graph and that the cost of every edge is the number of replacements made in the step, but I don't see how to use greedy for that (it is definitely not the Minimum Spanning Tree problem). I don't think it makes sense to identify all possible nodes & edge costs and to convert the problem in the Shortest Path problem. On the other side, I think in every step it makes sense to replace the character X with the biggest number of conflicts, with the character Y in conflict with X that occurs most in the string.
Anyway, I can't either prove that it works. Also I can't identify any known problems in this. Any ideas?
You need to identify disjunct sets of characters. A disjunct set of characters is a set of characters that will all have to become the same character in order for the string to become a palindrome.
Example:
Let's say we have the string abcdefgfmdebac
It has 3 disjunct sets, abc, de and fgm
Algorithm:
Pick the first character and check all occurences of it picking up other characters in the set.
In the example string we start with a and pick up band c (because they sit on the opposite sides of the two ain our string). We repeat the process for band c, but no new characters are added to the set. So abc is our first disjunct set.
Continue doing this with the remaining characters.
A disjunct set of n characters (counting all characters) needs n-m replacements, where m is the number of occurences of the most frequent character.
So simply sum over the sets.
In our example it takes 4 + 2 + 2 = 8 replacements.

adversary argument for finding n-bit strings

Given:
S, a set an odd number of n-bit strings
A, a particular n-bit string
show that any algorithm that decides whether A is in S must examine all n bits of A in the worst case.
Usually of course we would expect to have to look at all the parts of a string to do the matching, but there's something particular about S having an odd size that's escaping me.
Let's say we have an algorithm A that decides membership in S correctly and says, for any input n-bit string, whether the string is in S or not.
Suppose for a given input n-bit string s1, the algorithm A never looks at bit i of s1 and goes on to say "s1 is in (not in) S". Then a string s2 equal to s1 except with bit i flipped is also in (not in) S! That is, for any string we feed into A, if A doesn't look at a particular bit, then there is a second string also in (or not in) S with that bit flipped.
Then what is special about odd-sized sets S? We can't pair up strings in S evenly. That is, there must be a string s3 that A looks at and decides is in S, for which no single bit can be flipped to form another string in S. So A must look at all the bits of s3 (otherwise we could make such a string, as we did before).
I guess the odd number clue is to find the end of your set or array in memory.
Assume you are using a 32 bit system,
Perhaps the compiler aligns the data structutres of your program in memory on eight byte boundaries. You have a whole load of string pointers in your data segment.If there is an odd number of strings, the next thing that needs an eight byte alignment has four bytes of padding in front of it. If there is an even number of strings, there is no padding.
If i understand this correctly, it's irrelevant whether S has an odd or even number of strings. For any particular string in S to check that it matches arbitrary string A, you must check against each character in each. You can stop early if either string is shorter than the other or a character you're checking doesn't match.

is a* the same as (a*)*?

Quick question,
if a is a regular expression then is it true that a* = (a*)* ?
Is (a*)* a valid expression? If it is, then can anyone explain why is it the same as a*? I apologize for asking here, but I couldn't find anything via Google.
Yes, a*=(a*)* are same. Both generate same language that is string any numbers a's including null.
L(a*) = {^, a, aa, aa...... } = L ((a*)*)
Is (a*)* a valid expression?
Yes, this expression is called REGULAR-EXPRESSION (I saw you missed the tag). Any Regular Language(RL) can be represented by Regular Expression(RE). A alphabetical way of represent RL.
why is it the same?
* means repetition any numbers of time (including 0 times).
a* means 0 a, 1 a, 2 a or any number of a.
(a*)* means repetition for all string in a* set for any number of time (including 0 times).
Because L(a*) means all string consists using a. its supper-set of every set consists of strings of a's. and L((a*)*) is same.

Tokenize valid words from a long string

Suppose you have a dictionary that contains valid words.
Given an input string with all spaces removed, determine whether the string is composed of valid words or not.
You can assume the dictionary is a hashtable that provides O(1) lookup.
Some examples:
helloworld-> hello world (valid)
isitniceinhere-> is it nice in here (valid)
zxyy-> invalid
If a string has multiple possible parsings, just return true is sufficient.
The string can be very long. Hence think an algorithm that is both space & time efficient.
I think the set of all strings that occur as the concatenation of valid words (words taken from a finite dictionary) form a regular language over the alphabet of characters. You can then build a finite automaton that accepts exactly the strings you want; computation time is O(n).
For instance, let the dictionary consist of the words {bat, bag}. Then we construct the following automaton: states are denoted by 0, 1, 2. Edges: (0,1,b), (1,2,a), (2,0,t), (2,0,g); where the triple (x,y,z) means an edge leading from x to y on input z. The only accepting state is 0. In each step, on reading the next input sign, you have to calculate the set of states that are reachable on that input. Given that the number of states in the automaton is constant, this is of complexity O(n). As for space complexity, I think you can do with O(number of words) with the hint for construction above.
For an other example, with the words {bag, bat, bun, but} the automaton would look like this:
Supposing that the automaton has already been built (the time to do this has something to do with the length and number of words :-) we now argue that the time to decide whether a string is accepted by the automaton is O(n) where n is the length of the input string.
More formally, our algorithm is as follows:
Let S be a set of states, initially containing the starting state.
Read the next input character, let us denote it by a.
For each element s in S, determine the state that we move into from s on reading a; that is, the state r such that with the notation above (s,r,a) is an edge. Let us denote the set of these states by R. That is, R = {r | s in S, (s,r,a) is an edge}.
(If R is empty, the string is not accepted and the algorithm halts.)
If there are no more input symbols, check whether any of the accepting states is in R. (In our case, there is only one accepting state, the starting state.) If so, the string is accepted, if not, the string is not accepted.
Otherwise, take S := R and go to 2.
Now, there are as many executions of this cycle as there are input symbols. The only thing we have to examine is that steps 3 and 5 take constant time. Given that the size of S and R is not greater than the number of states in the automaton, which is constant and that we can store edges in a way such that lookup time is constant, this follows. (Note that we of course lose multiple 'parsings', but that was not a requirement either.)
I think this is actually called the membership problem for regular languages, but I couldn't find a proper online reference.
I'd go for a recursive algorithm with implicit backtracking. Function signature: f: input -> result, with input being the string, result either true or false depending if the entire string can be tokenized correctly.
Works like this:
If input is the empty string, return true.
Look at the length-one prefix of input (i.e., the first character). If it is in the dictionary, run f on the suffix of input. If that returns true, return true as well.
If the length-one prefix from the previous step is not in the dictionary, or the invocation of f in the previous step returned false, make the prefix longer by one and repeat at step 2. If the prefix cannot be made any longer (already at the end of the string), return false.
Rinse and repeat.
For dictionaries with low to moderate amount of ambiguous prefixes, this should fetch a pretty good running time in practice (O(n) in the average case, I'd say), though in theory, pathological cases with O(2^n) complexity can probably be constructed. However, I doubt we can do any better since we need backtracking anyways, so the "instinctive" O(n) approach using a conventional pre-computed lexer is out of the question. ...I think.
EDIT: the estimate for the average-case complexity is likely incorrect, see my comment.
Space complexity would be only stack space, so O(n) even in the worst-case.

Resources