is a* the same as (a*)*? - algorithm

Quick question,
if a is a regular expression then is it true that a* = (a*)* ?
Is (a*)* a valid expression? If it is, then can anyone explain why is it the same as a*? I apologize for asking here, but I couldn't find anything via Google.

Yes, a*=(a*)* are same. Both generate same language that is string any numbers a's including null.
L(a*) = {^, a, aa, aa...... } = L ((a*)*)
Is (a*)* a valid expression?
Yes, this expression is called REGULAR-EXPRESSION (I saw you missed the tag). Any Regular Language(RL) can be represented by Regular Expression(RE). A alphabetical way of represent RL.
why is it the same?
* means repetition any numbers of time (including 0 times).
a* means 0 a, 1 a, 2 a or any number of a.
(a*)* means repetition for all string in a* set for any number of time (including 0 times).
Because L(a*) means all string consists using a. its supper-set of every set consists of strings of a's. and L((a*)*) is same.

Related

{ w | at every odd position of w is a 1}

The task is to construct a DFA for this language over the alphabet {0,1}.
I have constructed a DFA that consists of 4 states and that does not accept an empty word. However, in the answers they give a 3 state DFA that accepts it.
Why should my DFA accept an empty word if in the empty word there is no 1 at the odd position which means that it is not in the language?
The only requirement is that any symbol at an odd position must be 1. There is no requirement for a particular number of symbols, and specifically not that there be at least one.
Therefore, a DFA with an initial state where 0 leads to a rejection state and where 1 leads to a second state which accepts either symbol and returns to the start would be an acceptable answer, and would accept the empty string. This would be a three-state machine:
I think you are confused why should an empty string be a part of a mentioned set.
Let's take a look at another example. Consider you have a set of all possible strings having every character equal to 0. Such strings would be 0, 00, 000, 00000, etc. What about an empty string *? It actually pertain to this set as well. Empty string does not violate the definition of the set.
Compare this example with yours. You should check every odd position of the string and if you'll find anything other than 1 you should say that it is not an element of you set. It is not said anything about whether a string should have an odd position to be checked.

Is there an algorithm for choosing a few strings from a list so that the number of strings equal the number of different letters in them?

Edit2: I think the solution of David Eisenstat works but I will check it before I call the question solved.
Example list of strings:
1.) "a"
2.) "ab"
3.) "bc"
4.) "dc"
5.) "efa"
6.) "ef"
7.) "gh"
8.) "hi"
You can choose number 1.) there's 1 string and 1 letter in it: "a"
You can also choose 1.) and 2.) these are 2 strings with only two different letters in them "a" and "b"
other valid string combinations:
1.) 2.) 3.)
1.) 5.) 6.)
there's no valid combination with "h" (it would be ideal if cases like this could be proven however you can assume the program only needs to work when there's a valid answer)
There could be an extra condition that the strings you choose must include one specified letter, however simply finding all the possible combinations would solve the problem just as well. eg. specified letter "c" the only solution in this case would be: 1.) 2.) 3.)
[optional information] The purpose of this: I want to make a program which can choose from a big list of equations (probably around 100) which ones can be used to solve for a variable. Each equation gets one string, each letter in the string representing one unknown. The list of equations are all different eg. cannot be derived from each other, so you need as many equations as many unknowns there are in them. Solving for the unknowns will be done in a CAS, so you don't need to worry about it. However I believe the CAS (Maxima) might have a limit on how many equations it can solve simultaneously and it might be too slow if you give it too many unnecessary equations at a time.
As a start I would use an algorithm to reduce the number of strings just to make it faster. First all strings containing specified letter are in the reduced list, then all strings containing the letters from the strings in the reduced list are part of the reduced list until none is added. eg reduced list of "g" would be 7.) "gh" and 8.) "hi" This would only remove some unnecessary strings, but the task would remain the same with the rest.
I think this can be solved by taking away unnecessary strings from the reduced list until all the remaining are needed, however I don't know how to explicitly define which strings would be unnecessary (except for those mentioned in the previous paragraph).
If you work with the extra condition: This is an optimization task. I don't need a perfect solution, only an optimal solution. The program doesn't need to find the absolute minimum number of strings that give a solution. Having a few extra strings in the solution would probably only slow the computer down, but it would be acceptable.
Edit: Optional clarification about the meaning of the strings: Each letter in a string represent an unknown in an equation so the equation a=2 would be represented by "a" because that's the only unknown. The equation a+b=0 would be represented by "ab" and b^2-c=0 by "bc"
I'm not sure what to call this problem. It seems NP-hard, so I'm going to suggest an integer programming formulation, which can be attacked by an off-the-shelf solver.
Let x_i be a 0-1 variable indicating whether equation i is included in the output. Let y_j be a 0-1 variable indicating whether variable j is included in the output. We have constraints
for all equations i, for all variables j in equation i, y_j - x_i >= 0.
We need as many equations as variables in the output.
(sum over all equations i of x_i) - (sum over all variables j of y_j) = 0
As you point out, the empty set needs specifically to be disallowed. Let k be a variable that must appear in the output.
sum over all equations i containing variable k of x_i >= 1
Naturally, the objective is
minimize sum over all equations i of x_i.

Generating strings from context free grammars

The problem is implementing an algorithm, that generates all strings with length between l and r from given context free grammar G.
I have come up with simple approach: run BFS on grammar graph, remembering states.
But it fails on some recursive rules:
(1) S -> 0 | SSS | λ
I can't simply limit maximum string length, because rules can contain λ (empty strings), so non-terminals can reduce final string length. (eg. running (1) with l = 1, r = 2 will output only 0 in my implementation)
I also tried to limit maximum number of applied rules, but it is obviously wrong too.
How i can limit or change my algorithm, so it will never go in endless loop and will work correctly?
You can transform the grammer to Greibach normal form, and then each step1 in the creationis increasing the size of the produced word, and you will be able to limit the length of the word as initially explained in the question.
(1) except possibly the first, if the empty word is in the grammer

Make palindrome from given word

I have given word like abca. I want to know how many letters do I need to add to make it palindrome.
In this case its 1, because if I add b, I get abcba.
First, let's consider an inefficient recursive solution:
Suppose the string is of the form aSb, where a and b are letters and S is a substring.
If a==b, then f(aSb) = f(S).
If a!=b, then you need to add a letter: either add an a at the end, or add a b in the front. We need to try both and see which is better. So in this case, f(aSb) = 1 + min(f(aS), f(Sb)).
This can be implemented with a recursive function which will take exponential time to run.
To improve performance, note that this function will only be called with substrings of the original string. There are only O(n^2) such substrings. So by memoizing the results of this function, we reduce the time taken to O(n^2), at the cost of O(n^2) space.
The basic algorithm would look like this:
Iterate over the half the string and check if a character exists at the appropriate position at the other end (i.e., if you have abca then the first character is an a and the string also ends with a).
If they match, then proceed to the next character.
If they don't match, then note that a character needs to be added.
Note that you can only move backwords from the end when the characters match. For example, if the string is abcdeffeda then the outer characters match. We then need to consider bcdeffed. The outer characters don't match so a b needs to be added. But we don't want to continue with cdeffe (i.e., removing/ignoring both outer characters), we simply remove b and continue with looking at cdeffed. Similarly for c and this means our algorithm returns 2 string modifications and not more.

Check if given string can be created by a set of characters cut out from magazine article

"Observe that when you cut a character out of a magazine, the character on the reverse side of the page is also removed. Give an algorithm to determine whether you can generate a given string by pasting cutouts from a given magazine. Assume that you are given a function that will identify the character and its position on the reverse side of the page for any given character position."
How can I do it?
I can do some initial pruning so that if a needed character has only one way of getting picked up, its taken initially before turning the sub-problem for dynamic technique, but what after this initial pruning?
What is the time and space complexity?
As #LiKao suggested, this can be solved using max flow. To construct the network we make two "layers" of vertices: one with all the distinct characters in the input string and one with each position on the page. Make an edge with capacity 1 from a character to a position if that position has that character on one side. Make edges of capacity 1 from each position to the sink, and make edges from the source to each character with capacity equal to the multiplicity of that character in the input string.
For example, let's say we're searching for the word "FOO" on a page with four positions:
pos 1 2 3 4
front F C O Z
back O O K Z
We then generate the following network, ignoring position 4 since it does not provide any of the required characters.
Now, we only need to determine if there is a flow from the source to the sink of length("FOO") = 3 or more.
You can use dynamic programming directly.
We are given string s with n letters. We are given a set of pieces P = {p_1, ..., p_k}. Each piece has one letter in the front p_i.f and one in the back p_i.b.
Denote with f(j, p) the function that returns true if it is feasible to create substring s_1...s_j using pieces in p \subseteq P, and false otherwise.
The following recurrence holds:
f(n, P) = f(n-1, P-p_1) | f(n-1, P-p_2) | ... | f(n-1, P-p_k)
In plain English the feasibility of s using all pieces in P, depends on the feasibility of the substring s_1...s_n-1 given one less piece, and we try removing all possible pieces (of course in practice we do not have to remove all pieces one by one; we only need to remove those pieces for which p_i.f == s_n || p_i.b == s_n).
The initial condition is that f(1, P-p_1) = f(1, P-p2) = ... = true, assuming that we have already checked a-priori (in linear time) that there are enough letters in P to cover all the letters in s.
While this problem can be formulated as a Maxflow problem as shown in the accepted answer, it is simpler and more efficient to formulate it as a maximum cardinality matching problem in a bipartite graph. Maxflow algorithms like Dinic's are slower than the special case algorithms like Hopcroft–Karp algorithm.
The bipartite graph is formed by adding two edges from every character of the given string to a cutout, one edge for each side. We then run Hopcroft–Karp. In the end, we simply check whether the cardinality of the matching is equal to the length of the string.
For a working implementation (in Scala) using JGraphT, see my GitHub.
I'd like to come up with a more efficient DP solution, since Skiena's book has this problem in the DP section, but so far haven't found any.

Resources